openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
694 stars 36 forks source link

[REVIEW]: AMLTK: A Modular AutoML Toolkit in Python #6367

Open editorialbot opened 4 months ago

editorialbot commented 4 months ago

Submitting author: !--author-handle-->@eddiebergman<!--end-author-handle-- (Edward Bergman) Repository: https://github.com/automl/amltk Branch with paper.md (empty if default branch): joss-paper Version: v1.3.4 Editor: !--editor-->@JBorrow<!--end-editor-- Reviewers: @gomezzz, @woznicak, @hirzel Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/63a52e26b9e3bb3cfb7e028df69d6fe5"><img src="https://joss.theoj.org/papers/63a52e26b9e3bb3cfb7e028df69d6fe5/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/63a52e26b9e3bb3cfb7e028df69d6fe5/status.svg)](https://joss.theoj.org/papers/63a52e26b9e3bb3cfb7e028df69d6fe5)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@gomezzz & @woznicak & @hirzel, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @JBorrow know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @gomezzz

📝 Checklist for @hirzel

📝 Checklist for @woznicak

editorialbot commented 4 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 4 months ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.22 s (765.3 files/s, 134583.0 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                         110           4948           6589          12376
Markdown                        34            955              0           2762
SVG                             11              6             10            967
YAML                             8             31             13            527
HTML                             5             48              0            268
TOML                             1             30             14            266
TeX                              1             17              0            133
CSS                              1             17              6             87
-------------------------------------------------------------------------------
SUM:                           171           6052           6632          17386
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 4 months ago

Wordcount for paper.md is 1189

editorialbot commented 4 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- None

MISSING DOIs

- 10.1109/tpami.2021.3067763 may be a valid DOI for title: Auto-Pytorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL
- 10.21105/joss.01132 may be a valid DOI for title: GAMA: Genetic Automated Machine Learning Assistant
- 10.1007/s10994-022-06200-0 may be a valid DOI for title: Naive automated machine learning
- 10.1145/2908812.2908918 may be a valid DOI for title: Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science
- 10.1371/journal.pdig.0000276 may be a valid DOI for title: AutoPrognosis 2.0: Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning

INVALID DOIs

- None
editorialbot commented 4 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

gomezzz commented 4 months ago

Review checklist for @gomezzz

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

JBorrow commented 4 months ago

Hi @eddiebergman, there seem to be some missing references in your paper. Would you be able to fix those and re-generate the PDF? Please include a valid DOI for all references (indeed 10.1109/TPAMI.2021.3067763 is the correct one for your first reference).

eddiebergman commented 4 months ago

@JBorrow Apologies, I missed that this was required in the paper.bib. I've included a doi for all entries where I could find one. Is there any command to have the bot re-check this?

JBorrow commented 4 months ago

@editorialbot check references

editorialbot commented 4 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.48550/arXiv.2003.06505 is OK
- 10.48550/arxiv.2207.12560 is OK
- 10.1109/tpami.2021.3067763 is OK
- 10.1007/978-3-030-05318-5 is OK
- 10.1007/978-3-030-67670-4_39 is OK
- 10.21105/joss.01132 is OK
- 10.48550/arxiv.2206.03493 is OK
- 10.48550/arxiv.1908.06756 is OK
- 10.1007/s10994-022-06200-0 is OK
- 10.1145/3292500.3330701 is OK
- 10.1007/978-3-030-05318-5_4 is OK
- 10.1145/2908812.2908918 is OK
- 10.1371/journal.pdig.0000276 is OK

MISSING DOIs

- None

INVALID DOIs

- None
JBorrow commented 4 months ago

All looks good now @eddiebergman, thank you.

gomezzz commented 4 months ago

(starting to go through the checklist now) @eddiebergman The paper has a fairly long author list and I believe JOSS has somewhat more restrictive authorship criteria. Could you clarify the contributions of the authors who did not contribute significantly to the code (deduced from here https://github.com/automl/amltk/graphs/contributors)?

eddiebergman commented 3 months ago

Hi @gomezzz,

The contributions are as follows:

Please let me know if I can further clarify any of these points :)

gomezzz commented 3 months ago

Thanks @eddiebergman for the details. I am not sure how this interacts with the authorship guidelines of JOSS and if all these are sufficient for authorship. I have no strong opinions on the matter, I'll check the box and maybe @JBorrow can decide on this aspect if there remains any uncertainty.

JBorrow commented 3 months ago

Hi @gomezzz, thank you. I will raise this with the editorial board shortly.

In other news, I will be out on vacation until the 24th, so please do not expect any responses before then.

hirzel commented 3 months ago

Review checklist for @hirzel

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

hirzel commented 3 months ago

Hi, I read through the paper and the Readme, tried the "pip install", and poked around in the repo as needed to answer the questions in the checklist. Overall, this seems like a nice package. Thank you for submitting it!

A couple of suggestions that could make this even better:

Finally, I was wondering if the paper should cite Lale. I hesitate to bring this up, because I am one of the Lale authors. However, I do believe it is quite relevant here for multiple reasons. The first example in the Readme illustrates a >> combinator for pipeline composition, which Lale also provides. The paper emphasizes AMLTK's support for multiple optimizers, which Lale also provides (see Section 4 of the NeurIPS paper). Also, the authors chose me as a reviewer in part because Lale is close related work for AMLTK. That said, if you decide not to cite the Lale paper here, I am fine with that.

JBorrow commented 3 months ago

Hi @woznicak, have you had a chance to start on your review yet?

JBorrow commented 3 months ago

Thank you @hirzel and @gomezzz for your comprehensive reviews of the package so far! @eddiebergman, please let me know if you expect a response to their comments to take longer than a few weeks!

JBorrow commented 3 months ago

@gomezzz, I have discussed this with a few people and we are happy with author scope. Thank you for raising this.

eddiebergman commented 3 months ago

Hi @JBorrow,

Apologies, I was away partially-working last week. I will be able to address all comments fully from Wednesday onwards.


@hirzel thanks for reviewing! Many apologies for the citation being missing. This was there in a previously longer submission and I guess it got cut out from there, we will add Lale back in. Regarding the >> operator and associated operators, you're right that heavy inspiration was drawn from Lale for this. Regarding the three raised bullet points:

hirzel commented 3 months ago

Thanks for your detailed reply. That sounds like a good plan. Regarding your question:

Anyhelp on communicating this to improve the paper would be greatly appreciated

I think you can pretty much add what you wrote above into the paper and/or readme:

there are very few libraries (none?) that could allow someone to build an AutoML system using scikit-learn/PyTorch for tabular data but then also allow someone to use the same framework to build an AutoML system for image data, for example

woznicak commented 3 months ago

Review checklist for @woznicak

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

gomezzz commented 2 months ago

@JBorrow @eddiebergman

I opened a bunch of minor issues mostly related to installation instructions and docs (these are behind the currently unchecked boxes for guidelines and installation instructions) but the required changes should be trivial, I think.

Aside from that, from my end, I have finished trying out amltk and looking at the code and paper. I am happy to say, it's a very comprehensive and mature project that deserves publication in JOSS. :)

JBorrow commented 2 months ago

Thanks @gomezzz for your review!

@eddiebergman, do you have a timeline on which you expect to address the comments?

JBorrow commented 2 months ago

@woznicak, thank you for initiating your checklist! Do you know when you will have time to complete the review fully? As the other reviewers are almost complete, it would be excellent for your contribution to come in soon so we can push the package through to publication.

eddiebergman commented 2 months ago

Hi @JBorrow,

We were waiting until all feedback was in to address all comments

JBorrow commented 1 month ago

Hi all, sorry for the silence here. We're waiting on the final reviewer and I have had some e-mail communication with them. If they can't complete it by the end of the week, then we will simply need to go ahead with the 2 reviewers that we already have. If they can, then great! But at the very least we should expect things to start moving by the end of the week.

woznicak commented 1 month ago

I apologize very much for the delays due to fortuitous reasons. I am very sorry to have delayed the whole review process.

I think a very big work has been done and in my opinion in paper can be accepted to JOSS. I see great novelty and impact on the AutoML field.

The paper and software is very interesting and important for the development of AutoML. It addresses key issues related to the comparability of entire frameworks or individual operations used in ML pipelines. The AutoML Fall School script is very useful for seeing the full capabilities of AMLTK.

The only aspect that, it is worth considering is missing examples that would show how we can reproduce or map the operation of already existing frameworks such as Autogluon, or at least their key elements such as Multilayer stacking. The examples that have been shown focus on showing the basic pipelines that can be built in scikitlearn and rather show the ease of implementing optimizations in SMAC or Optuna.

JBorrow commented 1 month ago

Great, thank you very much! Now that we have feedback from all three reviewers, please go ahead and make the requested changes @eddiebergman.

eddiebergman commented 1 month ago

Hi everyone, thanks for the helpful feedback! @JBorrow I can begin working on these from tomorrow :)

JBorrow commented 2 weeks ago

Hi @eddiebergman, checking in to see how the response is going and if everything is moving along?

eddiebergman commented 2 weeks ago

Hi @JBorrow, apologies, some other deadlines took precedence. I've made some paper changes in accordance to what was said here but mainly working on @gomezzz's helpful comments. I'm reducing some complexity and improving documentation for first time users.

JBorrow commented 2 weeks ago

Great! Thanks very much for the update. Once you've addressed their comments, the reviewers should feel free to go back to their checklists. Once all outstanding items are addressed, I'll take a final look over everything and hand it back to the EiC.