[REVIEW]: AutoRA: Automated Research Assistant for Closed-Loop Empirical Research

openjournals / joss-reviews

Reviews for the Journal of Open Source Software

Creative Commons Zero v1.0 Universal

694 stars 36 forks source link

[REVIEW]: AutoRA: Automated Research Assistant for Closed-Loop Empirical Research #6839

Open editorialbot opened 1 month ago

editorialbot commented 1 month ago

Submitting author: !--author-handle-->@musslick@jbytecode<!--end-editor-- Reviewers: @seandamiandevine, @szorowi1 Archive: Pending

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/be6d470033fbe5bd705a49858eb4e21e"><img src="https://joss.theoj.org/papers/be6d470033fbe5bd705a49858eb4e21e/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/be6d470033fbe5bd705a49858eb4e21e/status.svg)](https://joss.theoj.org/papers/be6d470033fbe5bd705a49858eb4e21e)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@seandamiandevine & @szorowi1, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jbytecode know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @seandamiandevine

📝 Checklist for @szorowi1

editorialbot commented 1 month ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 1 month ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.31222/osf.io/ysv2u is OK
- 10.1016/j.jbef.2017.12.004 is OK
- 10.48550/arXiv.1912.04871 is OK
- 10.48550/arXiv.2006.11287 is OK
- 10.1126/sciadv.aav6971 is OK
- 10.31234/osf.io/c2ytb is OK

MISSING DOIs

- No DOI given, and none found for title: Bayesian machine scientist for model discovery in ...
- No DOI given, and none found for title: An evaluation of experimental sampling strategies ...
- No DOI given, and none found for title: Scikit-learn: Machine learning in python
- No DOI given, and none found for title: A Unified Framework for Deep Symbolic Regression

INVALID DOIs

- None

editorialbot commented 1 month ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.01 s (547.6 files/s, 36692.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Markdown                         2             33              0            100
TeX                              1             14              0             90
YAML                             1              1              5             25
-------------------------------------------------------------------------------
SUM:                             4             48              5            215
-------------------------------------------------------------------------------

Commit count by author:

    11  Sebastian Musslick
     3  musslick
     2  Younes Strittmatter

editorialbot commented 1 month ago

Paper file info:

📄 Wordcount for paper.md is 1549

✅ The paper includes a Statement of need section

editorialbot commented 1 month ago

License info:

✅ License found: MIT License (Valid open source OSI approved license)

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 1 month ago

@seandamiandevine, @szorowi1 - Dear reviewers, you can start with creating your task lists. In that list, there are several tasks.

Whenever you perform a task, you can check on the corresponding checkbox. Since the review process of JOSS is interactive, you can always interact with the author, the other reviewers, and the editor during the process. You can open issues and pull requests in the target repo. Please mention the url of this page in there so we can keep tracking what is going on out of our world.

Please create your tasklist by typing

@editorialbot generate my checklist

Thank you in advance.

jbytecode commented 1 month ago

@editorialbot remind @szorowi1 in two weeks

editorialbot commented 1 month ago

Reminder set for @szorowi1 in two weeks

seandamiandevine commented 4 weeks ago

Review checklist for @seandamiandevine

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/AutoResearch/autora-paper?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@musslick) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[ ] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

szorowi1 commented 3 weeks ago

Review checklist for @szorowi1

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/AutoResearch/autora-paper?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@musslick) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[ ] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

editorialbot commented 2 weeks ago

:wave: @szorowi1, please update us on how your review is going (this is an automated reminder).

szorowi1 commented 2 weeks ago

Hi @jbytecode, hope you've been well! I'm working my way through the review. I was wondering if I could request some guidance for establishing functionality. The AutoRA library is quite extensive, distributed across 30+ python packages (though some are quite small, composed of only a few functions/classes). What would you consider to be sufficient for demonstrating functionality (e.g., working through the tutorials/examples in the docs, applying the software to a novel personal use case, etc.)? Thank you!

jbytecode commented 2 weeks ago

@musslick - Could you please provide guidelines and help our reviewer on the issue mentioned above?

@szorowi1 - Any critics/suggestions/corrections/thoughts are welcome. Following the checklist items are generally enough.

musslick commented 2 weeks ago

@jbytecode Sure thing!

@szorowi1 (also tagging @seandamiandevine ) Thanks for checking about this. We discussed with the development team what might be a good functionality test for AutoRA. We reached consensus that the Tutorials and the two examples Equation Discovery and Online Closed-Loop Discovery would capture most of the core functionality of AutoRA. So evaluating those might be most appropriate for a functionality test. Note that all of the tutorials (except the online closed-loop discovery) should be executable via Google Colab. Please let us know if you run into any issues or have any other questions---and thanks for all your work!

seandamiandevine commented 1 week ago

Thanks @musslick for the direction. It was very helpful in guiding functionality tests.

Checklist-related comments

Installation worked well on my local machine. I tested it with pip and conda, both of which executed without issue and key functionality was maintained. Unsurprisingly, functionality worked well within Google Collab.
The tutorials are clear and all functionality works well on my end.
The section on using Prolific for the Online Closed-Loop Discovery example was not yet completed for me. I would have liked to test this, as I am more familiar with Prolific testing than Firebase. Though I was able to follow the instructions for creating a test environment in Firebase without any problems. If the authors can update the section on Prolific integration, I'd be happy to test it out.
I encountered errors running the Equation Discovery tutorial on Google Collab. In step 6 of the Polynomial Regressor section, I was unable to run the present_results() function, as conditions_test and observations_test were not the same size: ValueError: x and y must be the same size. This occurred for the 2d and 3d visualizations. A similar error occurred in the Bayesian Machine Scientist section, making it so I was unable to fit the example bms model. Since this example notebook has a lot of useful tools for navigating and using autoRA, the authors may want to patch these errors up for new users.

General comments

The tutorials and documentation made running synthetic experiments simple and clear. However, after playing with autoRA on-and-off for about 2 weeks, I still find myself a little unclear on how I could use some of these functions without knowing the data generating ("ground truth") function. In other words, I'm still a little unsure how I could use autoRA to 1) collect new empirical data (see Prolific comment above), 2) generate candidate models given empirical experimental data, and 3) how to iteratively update my model predictions with new data through autoRA. A whole new tutorial on this may not be feasible at this stage, but perhaps a toy example (with Prolific?) could be a useful illustration for new users (like me!).
This is a really cool project that feels like it could be very useful to researchers as it develops. Thanks for thinking of me!

jbytecode commented 2 days ago

@musslick - Could you please update your status and inform us how is your study going? Do we have any improvements in light of our reviewer's suggestions?

szorowi1 commented 2 days ago

Apologies to all for the delay, it's been a hectic few weeks!

Let me start by saying congrats to @musslick and co-authors/collaborators! This is a really impressive framework and it's obvious how much careful attention, thought, and effort went into developing it. Kudos!

I've now had a chance to work through the documentation, tutorials, and examples. The installation went fine, the code works as expected, and the Docs/API are robust. To echo @seandamiandevine, I also ran into a number of errors when running through the Equation Discovery tutorial in Collab having to do with data shape mismatches. When running the Experimentalist.ipynb tutorial notebook in Collab, I also ran into the following error early on:

ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-2-508cdcdb2e51>](https://localhost:8080/#) in <cell line: 4>()
      2 from sklearn.linear_model import LinearRegression
      3 from autora.variable import DV, IV, ValueType, VariableCollection
----> 4 from autora.experimentalist.sampler.falsification import falsification_sample, falsification_score_sample

ModuleNotFoundError: No module named 'autora.experimentalist.sampler'

I agree with @seandamiandevine it would be good to make sure the Example notebooks run all the way through for new users.

Some general thoughts, none of which should necessarily preclude acceptance or publication:

Depending on who the intended "core audience" is, I think more elaborate examples might be helpful. If one of the main intended audiences are cognitive scientists/psychologists, then an additional example might be using AutoRA to identify experiments best suited for discriminating between competing cognitive models (e.g., as in Cavagnaro et al. 2013 or Cavagnaro et al. 2016). [As an aside, it would be interesting to hear how you think AutoRA relates to adaptive design optimization more broadly.]
Relatedly, one experimentalist type not yet covered (I don't think?) is choosing trials to minimize posterior uncertainty over model parameters (e.g., Ahn et al. 2020).
Final thought: Oh what I would have given to have had something like this during my PhD 🙂

musslick commented 9 hours ago

Dear @seandamiandevine and @szorowi1,

Thank you both so much for investing the time and effort into this review and for providing such thorough and constructive feedback. We really appreciate that!

I discussed your feedback with the team, and we agree that there is not sufficient (and complete) information about how to utilize the closed-loop functionality of AutoRA for real-world experiments. Adding respective examples in the documentation would be beneficial, especially for researchers interested in behavioral experiments.

We propose to do the following:

Fix the errors in the Equation Discovery tutorial and the Experimentalist.ipynb notebook.
Include the following two end-to-end examples for closed-loop experimentation with AutoRA (using a both Prolific, and Firebase):

2.1 Mathematical model discovery for a psychophysics experiment

2.2 Computational (reinforcement learning) model discovery for a one-armed bandit experiment

Once we implemented and internally vetted those tutorials we would love to get your feedback on those. That said, we would also understand if you've had enough of AutoRA already and/or don't have the time ;)

As a quick fix, we already expanded the Closed-Loop Online Experiment Example to include a description of how to combine AutoRA with Prolific (to address @seandamiandevine' initial point).

In addition, to follow-up on the general thoughts from @szorowi1's, we would aim to include two additional examples for closed-loop experimentation (also using Prolific and Firebase). We may not be able to get them implemented over the course of the review process but wanted to hear your thoughts on whether these could be a useful target for our next development milestone:

2.3 Drift diffusion model comparison for a random-object-kinematogram (RDK) experiment using Bayesian optimal experimental design (specifically minimizing posterior uncertainty)

2.4 Experiment parameter tuning for a task-switching experiment (to illustrate how AutoRA can be used to for automated design optimization, e.g., to enhance a desired behavioral effect, such as task switch costs)

Finally, to address @szorowi1's question: We think that AutoRA could be used for design optimization (we could illustrate this in Example 2.4). However, it's not (yet) capable of adapting the experiment on the fly, i.e., within a single experiment session. Rather, it can help optimize design after collecting data from a set of experiments, and then proposing a new set of experiments.

Please let us know what you think!