[REVIEW]: FuseMedML: a framework for accelerated discovery in machine learning based biomedicine

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@alex-golts@jmschrei<!--end-editor-- Reviewers: @anupamajha1, @suragnair Archive: 10.5281/zenodo.7346694

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/4d0c5f077e6bc1ddc4c75e75774ce976"><img src="https://joss.theoj.org/papers/4d0c5f077e6bc1ddc4c75e75774ce976/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/4d0c5f077e6bc1ddc4c75e75774ce976/status.svg)](https://joss.theoj.org/papers/4d0c5f077e6bc1ddc4c75e75774ce976)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@anupamajha1 & @suragnair, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jmschrei know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @suragnair

📝 Checklist for @anupamajha1

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 1 year ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.50 s (468.7 files/s, 177355.2 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
JSON                             2              0              0          58591
Python                         193           4330           6493          15333
Markdown                        24            492              0           1457
Jupyter Notebook                 3              0           1134            502
TeX                              1             19              0            145
YAML                             6             11             10            126
Bourne Shell                     4             33             22            118
INI                              1             14              0             92
TOML                             1              1              0              2
-------------------------------------------------------------------------------
SUM:                           235           4900           7659          76366
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 1 year ago

Wordcount for paper.md is 1078

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1117/12.2609385 is OK
- 10.1117/12.2613169 is OK
- 10.3390/cancers14163848 is OK
- 10.1007/978-3-030-87592-3_4 is OK
- 10.1007/978-3-030-87589-3_29 is OK
- 10.1007/978-3-030-98385-7_14 is OK
- 10.1148/radiol.220027 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jmschrei commented 1 year ago

Howdy @anupamajha1 and @suragnair!

Thanks for agreeing to review this submission.

The process for conducting a review is outlined above. Please run the command shown above to have @editorialbot generate your checklist, which will give a step-by-step process for conducting your review. Please check the boxes during your review to keep track, as well as make comments in this thread or open issues in the repository itself to point out issues you encounter. Keep in mind that our aim is to improve the submission to the point where it is of high enough quality to be accepted, rather than to provide a yes/no decision, and so having a conversation with the authors is encouraged rather than providing a single review post at the end of the process.

Here are the review guidelines: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html And here is a checklist, similar to above: https://joss.readthedocs.io/en/latest/review_checklist.html

Please let me know if you encounter any issues or need any help during the review process, and thanks for contributing your time to JOSS and the open source community!

suragnair commented 1 year ago

Review checklist for @suragnair

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/BiomedSciAI/fuse-med-ml?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@alex-golts) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

anupamajha1 commented 1 year ago

Review checklist for @anupamajha1

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/BiomedSciAI/fuse-med-ml?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@alex-golts) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

jmschrei commented 1 year ago

Hi @anupamajha1 and @suragnair, how are the reviews coming?

anupamajha1 commented 1 year ago

Issues opened regarding

alex-golts commented 1 year ago

I addressed all three issues by @anupamajha1 . please let me know if they can be closed or if there are any further comments.

anupamajha1 commented 1 year ago

@alex-golts: Thanks for addressing my comments. Here are some suggestions before I can finish my review checklist:

I installed FuseMedML and tried running it on the multiple examples you provided. The download instructions for datasets other than MNIST need to be improved. Currently, the READMEs, for many examples, point to the website with the data. You should include steps for download in all the example data READMEs.
The writing quality of the paper can be improved.

suragnair commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

alex-golts commented 1 year ago

@anupamajha1 Thanks for these suggestions!

We added a bash script to the KNIGHT example to download the data and edited the README file correspondingly. The STOIC21 example already contains in the code a class that downloads the data. For the remaining examples, obtaining the data involves some form of manual registration, so we couldn't easily automate that.
We revised the writing a bit, I hope it is somewhat better. you can see the changes in this PR. If you believe there are still any areas in the writing that could be improved upon, please feel free to let us know which ones specifically.

suragnair commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

anupamajha1 commented 1 year ago

@alex-golts: thanks for the updates! @jmschrei: I have finished the review checklist. Please let me know if anything else is needed.

suragnair commented 1 year ago

Some more suggestions for improving the paper further. Firstly, it would be helpful to have a "design philosophy" focused figure that outlines NDict and input/output of fuse models (can be borrowed from the video in the README).

It would help to make the following improvements:

Figure 1: It is not immediately apparent how data.img in updated_sample maps to img in sample, and so on. Adding arrows connecting them would be useful, like in the video.
Figure 2: Why does 1 need to be specified in ('data.input.img', 1)?
Figure 3: EvaluatorDefault needs some explanation in the caption.

alex-golts commented 1 year ago

Thanks for the suggestions @suragnair , I have addressed them as follows:

Added a figure with a diagram explaining the design philosophy
Added arrows as you suggested in Fig. 1 (now Fig. 2)
The "1" argument represents the number of input channels. we keep it currently for backward compatibility. I added a comment to the code in the figure explaining what it is.
Modified the caption of Fig. 3 (now Fig. 4) to explain EvaluatorDefault

Please let me know if you have any more questions or suggestions

suragnair commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

alex-golts commented 1 year ago

Thanks for the suggestions @suragnair , I have addressed them as follows:

Added a figure with a diagram explaining the design philosophy

Added arrows as you suggested in Fig. 1 (now Fig. 2)

The "1" argument represents the number of input channels. we keep it currently for backward compatibility. I added a comment to the code in the figure explaining what it is.

Modified the caption of Fig. 3 (now Fig. 4) to explain EvaluatorDefault

Please let me know if you have any more questions or suggestions

@suragnair just an update about the 3rd bullet. we will support not having to specify the number of input channels (see https://github.com/BiomedSciAI/fuse-med-ml/pull/244). so I changed the figure accordingly. removed the comment, and it now doesn't have the "1" argument that you asked about.

suragnair commented 1 year ago

This is slightly open-ended and subjective. The Hello World notebook says "By the end of the session we hope you'll be familiar with basic Fuse's workflow and acknowledge its potential." However, after going through the paper, docs and notebooks, I am still not able to fully grasp the power of FuseMedML over typical workflows.

I fully understand that FuseMedML implements a lot of handy functions, especially for evaluation, and I can see how that by itself is useful for ML projects. However, I'm not fully convinced how the NDict concept enhances reproducibility by itself. One could argue that well documented individual functions/classes would probably be just as useful?

Would be interested to hear your thoughts on this. It would also be very helpful to add some more comments to the Hello World notebook, especially specifying how FuseMedML concepts enhance reproducibility over how someone would do it without FuseMedML.

alex-golts commented 1 year ago

The "hello world" notebook is meant to be simple and in simple cases like a standard MNIST classification pipeline it can be indeed a bit more challenging to demonstrate the particular usefulness of the fuse "philosophy" vs. standard approaches. But to give one example that may emphasize some of the benefits, even looking at just this simple example:

In the instantiation of BatchSamplerDefault, we pass a key name (string) that specifies the label by which to balance samples. This assumes the notion of a sample_dict that represents a data sample that will have this key. Once users across different projects agree on this methodology and use sample_dicts to store their data, it makes it easier to reuse code. For example, the code for creating a balanced sampler in another project would be exactly the same (with maybe only the key name being different). In a standard approach, different projects would have the data stored in different ways, so they might need to have a different implementation of this custom sampler per project, or they will need to agree on another convention to make the code fully reusable. The BatchSamplerDefault is indeed reusable between different examples that we have, you can see it used in the same way for example here and here.

I agree that it's possible to think of different conventions than NDict of course, and as long as those are used, it could be equally convenient to use other types of well documented classes or functions. So in that sense, NDict is a choice that we made that we find convenient. but the usefulness is in having many generic building blocks that all "comply" with this methodology. and if users decide to use it for their project from the start, then we believe they will find it convenient to reuse code from existing projects that use the same methodology.

We will add some comments along the "hello world" notebook to make this point more clear. And we also have a couple more thoughts about how to make this notebook better at showcasing the way fuse facilitates better code reuse. we're still discussing it and plan to make some improvements going forward.

suragnair commented 1 year ago

Thanks for the explanation and I agree that it would be great to make the benefits more obvious to potential users. As the above isn't a strict requirement in the checklist, I have signed off on all remaining items. @jmschrei my review is complete.

jmschrei commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1117/12.2609385 is OK
- 10.1117/12.2613169 is OK
- 10.3390/cancers14163848 is OK
- 10.1007/978-3-030-87592-3_4 is OK
- 10.1007/978-3-030-87589-3_29 is OK
- 10.1007/978-3-030-98385-7_14 is OK
- 10.1148/radiol.220027 is OK
- 10.5281/zenodo.3828935 is OK

MISSING DOIs

- None

INVALID DOIs

- https://doi.org/10.48550/arXiv.2211.02701 is INVALID because of 'https://doi.org/' prefix

jmschrei commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jmschrei commented 1 year ago

@alex-golts can you fix that one invalid DOI? Everything else looks good to me. Can you also provide me with a Zenodo DOI for the paper of the form 10.xxxx/zenodo.xxxxxxx and a version number for the software? Thanks!

alex-golts commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1117/12.2609385 is OK
- 10.1117/12.2613169 is OK
- 10.3390/cancers14163848 is OK
- 10.1007/978-3-030-87592-3_4 is OK
- 10.1007/978-3-030-87589-3_29 is OK
- 10.1007/978-3-030-98385-7_14 is OK
- 10.1148/radiol.220027 is OK
- 10.5281/zenodo.3828935 is OK
- 10.48550/arXiv.2211.02701 is OK

MISSING DOIs

- None

INVALID DOIs

- None

alex-golts commented 1 year ago

@jmschrei I have fixed the invalid DOI.

can you please clarify what exactly you mean by "Zenodo DOI for the paper"? I see according to the submission instructions that we're expected to deposit the repository to a service like Zenodo. did you mean that? in that case we already have it here. Or did you mean to just deposit the paper? If that's the case, should it be the PDF? the raw "paper.md" and "paper.bib" files?

jmschrei commented 1 year ago

Yes, that's what I was looking for. Thanks. Is the version you'd like tagged with the submission v0.2.9?

alex-golts commented 1 year ago

Yes, that's what I was looking for. Thanks. Is the version you'd like tagged with the submission v0.2.9?

yes, v0.2.9 sorry, forgot to answer you on that one :-)

jmschrei commented 1 year ago

@editorialbot set v0.2.9 as version

editorialbot commented 1 year ago

Done! version is now v0.2.9

jmschrei commented 1 year ago

@editorialbot set 10.5281/zenodo.7346694 as archive

editorialbot commented 1 year ago

Done! Archive is now 10.5281/zenodo.7346694

jmschrei commented 1 year ago

@editorialbot recommend-accept

editorialbot commented 1 year ago

Attempting dry run of processing paper acceptance...

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1117/12.2609385 is OK
- 10.1117/12.2613169 is OK
- 10.3390/cancers14163848 is OK
- 10.1007/978-3-030-87592-3_4 is OK
- 10.1007/978-3-030-87589-3_29 is OK
- 10.1007/978-3-030-98385-7_14 is OK
- 10.1148/radiol.220027 is OK
- 10.5281/zenodo.3828935 is OK
- 10.48550/arXiv.2211.02701 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot commented 1 year ago

:wave: @openjournals/bcm-eics, this paper is ready to be accepted and published.

Check final proof :point_right::page_facing_up: Download article

If the paper PDF and the deposit XML files look good in https://github.com/openjournals/joss-papers/pull/3887, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

Kevin-Mattheus-Moerman commented 1 year ago

@editorialbot set 0.2.9 as version

editorialbot commented 1 year ago

Done! version is now 0.2.9

Kevin-Mattheus-Moerman commented 1 year ago

@alex-golts and co-authors, I am the AEiC on this track and here to help process final steps towards acceptance in JOSS. Before we can proceed, below are some some minor issues that require your attention:

On the paper (You can call @editorialbot generate pdf to update the draft):

[x] The paper mentions

A data pipeline may consist of a static_pipline and a dynamic_pipeline.

Check if static_pipline should read static_pipeline.

[x] Fix spelling for acording, which should read according

On the archive:

[x] Please manually edit ZENODO archive title to match the paper title
[x] Please also manually edit the ZENODO archive author list to match the paper authors and order (you may need to remove any contributors that were added on the ZENODO list)
[x] Please manually edit the ZENODO listed license (currently it says Other (Open)) to match your actual software's open source license.

Kevin-Mattheus-Moerman commented 1 year ago

@jmschrei thanks for your help as editor here. Note the archive issues I am pointing out ☝️ , in the future these are things you can check before recommending accept. But no worries we'll get it sorted out now. Thanks again for the help! 🎉

alex-golts commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

openjournals / joss-reviews