Open editorialbot opened 4 months ago
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.88 T=0.22 s (765.3 files/s, 134583.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 110 4948 6589 12376
Markdown 34 955 0 2762
SVG 11 6 10 967
YAML 8 31 13 527
HTML 5 48 0 268
TOML 1 30 14 266
TeX 1 17 0 133
CSS 1 17 6 87
-------------------------------------------------------------------------------
SUM: 171 6052 6632 17386
-------------------------------------------------------------------------------
gitinspector failed to run statistical information for the repository
Wordcount for paper.md
is 1189
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- None
MISSING DOIs
- 10.1109/tpami.2021.3067763 may be a valid DOI for title: Auto-Pytorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL
- 10.21105/joss.01132 may be a valid DOI for title: GAMA: Genetic Automated Machine Learning Assistant
- 10.1007/s10994-022-06200-0 may be a valid DOI for title: Naive automated machine learning
- 10.1145/2908812.2908918 may be a valid DOI for title: Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
- 10.1371/journal.pdig.0000276 may be a valid DOI for title: AutoPrognosis 2.0: Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning
INVALID DOIs
- None
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Hi @eddiebergman, there seem to be some missing references in your paper. Would you be able to fix those and re-generate the PDF? Please include a valid DOI for all references (indeed 10.1109/TPAMI.2021.3067763 is the correct one for your first reference).
@JBorrow Apologies, I missed that this was required in the paper.bib
. I've included a doi for all entries where I could find one. Is there any command to have the bot re-check this?
@editorialbot check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.48550/arXiv.2003.06505 is OK
- 10.48550/arxiv.2207.12560 is OK
- 10.1109/tpami.2021.3067763 is OK
- 10.1007/978-3-030-05318-5 is OK
- 10.1007/978-3-030-67670-4_39 is OK
- 10.21105/joss.01132 is OK
- 10.48550/arxiv.2206.03493 is OK
- 10.48550/arxiv.1908.06756 is OK
- 10.1007/s10994-022-06200-0 is OK
- 10.1145/3292500.3330701 is OK
- 10.1007/978-3-030-05318-5_4 is OK
- 10.1145/2908812.2908918 is OK
- 10.1371/journal.pdig.0000276 is OK
MISSING DOIs
- None
INVALID DOIs
- None
All looks good now @eddiebergman, thank you.
(starting to go through the checklist now) @eddiebergman The paper has a fairly long author list and I believe JOSS has somewhat more restrictive authorship criteria. Could you clarify the contributions of the authors who did not contribute significantly to the code (deduced from here https://github.com/automl/amltk/graphs/contributors)?
Hi @gomezzz,
The contributions are as follows:
Please let me know if I can further clarify any of these points :)
Thanks @eddiebergman for the details. I am not sure how this interacts with the authorship guidelines of JOSS and if all these are sufficient for authorship. I have no strong opinions on the matter, I'll check the box and maybe @JBorrow can decide on this aspect if there remains any uncertainty.
Hi @gomezzz, thank you. I will raise this with the editorial board shortly.
In other news, I will be out on vacation until the 24th, so please do not expect any responses before then.
Hi, I read through the paper and the Readme, tried the "pip install", and poked around in the repo as needed to answer the questions in the checklist. Overall, this seems like a nice package. Thank you for submitting it!
A couple of suggestions that could make this even better:
Performance: The paper does not report any experimental results. On the other hand, the first claimed contribution is "(a) Enabling systematic comparison". Such comparison is only meaningful if AMLTK has competitive performance. Therefore, I encourage you to report some numbers.
Architecture: The paper does not contain any architecture diagram. I would assume such a diagram could show how pipelines, search space, optimizers, and schedulers interact. Adding a diagram would help readers more quickly grasp the functionality, and might also be helpful for users and contributors.
Novelty: I was looking for novel claims in the paper, but if they are there, I missed them. Perhaps novelty is not the aim here, taking a backseat to the well-engineered implementation of known concepts? On the other hand, if there is novelty, you should consider pointing it out explicitly.
Finally, I was wondering if the paper should cite Lale. I hesitate to bring this up, because I am one of the Lale authors. However, I do believe it is quite relevant here for multiple reasons. The first example in the Readme illustrates a >> combinator for pipeline composition, which Lale also provides. The paper emphasizes AMLTK's support for multiple optimizers, which Lale also provides (see Section 4 of the NeurIPS paper). Also, the authors chose me as a reviewer in part because Lale is close related work for AMLTK. That said, if you decide not to cite the Lale paper here, I am fine with that.
Hi @woznicak, have you had a chance to start on your review yet?
Thank you @hirzel and @gomezzz for your comprehensive reviews of the package so far! @eddiebergman, please let me know if you expect a response to their comments to take longer than a few weeks!
@gomezzz, I have discussed this with a few people and we are happy with author scope. Thank you for raising this.
Hi @JBorrow,
Apologies, I was away partially-working last week. I will be able to address all comments fully from Wednesday onwards.
@hirzel thanks for reviewing! Many apologies for the citation being missing. This was there in a previously longer submission and I guess it got cut out from there, we will add Lale back in. Regarding the >>
operator and associated operators, you're right that heavy inspiration was drawn from Lale for this. Regarding the three raised bullet points:
Performance - We did consider doing some empirical evaluations but refrained from doing so as:
Architecture - Absolutely! We had a previous version from early on in the development cycle. However as all things go early on, the accuracy between the actual implementation and the developed version begin to differ. I've included an old version which provides an abstract overview for reference which should not be taken fully at face value anymore. It's largely correct however there's some modifications I would like to do. I also believe a code overview would be helpful as was raised by @gomezzz in this issue. I will try to address both of these issues together.
Novelty - You're correct that there's nothing inherently novel that does not exist disjoint from what's out there. However there are very few libraries (none?) that could allow someone to build an AutoML system using scikit-learn/PyTorch for tabular data but then also allow someone to use the same framework to build an AutoML system for image data, for example. This comes at a tradeoff of being less immediately useful, e.g. we have no concept of dataset
in the code, but aims to address the fact building an AutoML system requires a lot of systems work which is constantly rebuilt for every tool out there. I believe this to be a novelty but not in the traditional research paper form. Anyhelp on communicating this to improve the paper would be greatly appreciated!
Thanks for your detailed reply. That sounds like a good plan. Regarding your question:
Anyhelp on communicating this to improve the paper would be greatly appreciated
I think you can pretty much add what you wrote above into the paper and/or readme:
there are very few libraries (none?) that could allow someone to build an AutoML system using scikit-learn/PyTorch for tabular data but then also allow someone to use the same framework to build an AutoML system for image data, for example
@JBorrow @eddiebergman
I opened a bunch of minor issues mostly related to installation instructions and docs (these are behind the currently unchecked boxes for guidelines and installation instructions) but the required changes should be trivial, I think.
Aside from that, from my end, I have finished trying out amltk and looking at the code and paper. I am happy to say, it's a very comprehensive and mature project that deserves publication in JOSS. :)
Thanks @gomezzz for your review!
@eddiebergman, do you have a timeline on which you expect to address the comments?
@woznicak, thank you for initiating your checklist! Do you know when you will have time to complete the review fully? As the other reviewers are almost complete, it would be excellent for your contribution to come in soon so we can push the package through to publication.
Hi @JBorrow,
We were waiting until all feedback was in to address all comments
Hi all, sorry for the silence here. We're waiting on the final reviewer and I have had some e-mail communication with them. If they can't complete it by the end of the week, then we will simply need to go ahead with the 2 reviewers that we already have. If they can, then great! But at the very least we should expect things to start moving by the end of the week.
I apologize very much for the delays due to fortuitous reasons. I am very sorry to have delayed the whole review process.
I think a very big work has been done and in my opinion in paper can be accepted to JOSS. I see great novelty and impact on the AutoML field.
The paper and software is very interesting and important for the development of AutoML. It addresses key issues related to the comparability of entire frameworks or individual operations used in ML pipelines. The AutoML Fall School script is very useful for seeing the full capabilities of AMLTK.
The only aspect that, it is worth considering is missing examples that would show how we can reproduce or map the operation of already existing frameworks such as Autogluon, or at least their key elements such as Multilayer stacking. The examples that have been shown focus on showing the basic pipelines that can be built in scikitlearn and rather show the ease of implementing optimizations in SMAC or Optuna.
Great, thank you very much! Now that we have feedback from all three reviewers, please go ahead and make the requested changes @eddiebergman.
Hi everyone, thanks for the helpful feedback! @JBorrow I can begin working on these from tomorrow :)
Hi @eddiebergman, checking in to see how the response is going and if everything is moving along?
Hi @JBorrow, apologies, some other deadlines took precedence. I've made some paper changes in accordance to what was said here but mainly working on @gomezzz's helpful comments. I'm reducing some complexity and improving documentation for first time users.
Great! Thanks very much for the update. Once you've addressed their comments, the reviewers should feel free to go back to their checklists. Once all outstanding items are addressed, I'll take a final look over everything and hand it back to the EiC.
Submitting author: !--author-handle-->@eddiebergman<!--end-author-handle-- (Edward Bergman) Repository: https://github.com/automl/amltk Branch with paper.md (empty if default branch): joss-paper Version: v1.3.4 Editor: !--editor-->@JBorrow<!--end-editor-- Reviewers: @gomezzz, @woznicak, @hirzel Archive: Pending
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@gomezzz & @woznicak & @hirzel, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @JBorrow know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Checklists
📝 Checklist for @gomezzz
📝 Checklist for @hirzel
📝 Checklist for @woznicak