openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: Eureka!: An End-to-End Pipeline for JWST Time-Series Observations #4503

Closed editorialbot closed 2 years ago

editorialbot commented 2 years ago

Submitting author: !--author-handle-->@taylorbell57<!--end-author-handle-- (Taylor Bell) Repository: https://github.com/kevin218/Eureka Branch with paper.md (empty if default branch): joss Version: v0.6 Editor: !--editor-->@dfm<!--end-editor-- Reviewers: @catrionamurray, @christinahedges, @dfm Archive: 10.5281/zenodo.7278300

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/a1cb3f6e4e58c00bc0ab1b471c04812b"><img src="https://joss.theoj.org/papers/a1cb3f6e4e58c00bc0ab1b471c04812b/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/a1cb3f6e4e58c00bc0ab1b471c04812b/status.svg)](https://joss.theoj.org/papers/a1cb3f6e4e58c00bc0ab1b471c04812b)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@catrionamurray & @christinahedges, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @dfm know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @catrionamurray

📝 Checklist for @christinahedges

📝 Checklist for @dfm

editorialbot commented 2 years ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 2 years ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.26 s (399.5 files/s, 88641.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          78           2975           8126           8828
reStructuredText                 8            542            364            813
Markdown                         4            170              0            481
YAML                             8              6             11            274
TeX                              1             11              0            168
DOS Batch                        1              8              1             26
Cython                           1              7              0             19
make                             1              4              7              9
TOML                             1              0              0              3
-------------------------------------------------------------------------------
SUM:                           103           3723           8509          10621
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 2 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1088/0004-637X/754/2/136 is OK
- 10.1088/0004-637X/768/1/42 is OK
- 10.1088/0004-6256/147/6/161 is OK
- 10.1038/s41550-017-0351-6 is OK
- 10.3847/2041-8213/aabcc8 is OK
- 10.1093/mnras/stab1027 is OK
- 10.1038/nature12888 is OK
- 10.5281/zenodo.1998447 is OK
- 10.1093/mnras/stz2688 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 2 years ago

Wordcount for paper.md is 1394

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

dfm commented 2 years ago

@catrionamurray, @christinahedges — This is the review thread for the paper. All of our communications will happen here from now on. Thanks again for agreeing to participate!

Please read the "Reviewer instructions & questions" in the first comment above, and generate your checklists by commenting @editorialbot generate my checklist on this issue ASAP. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4503 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please try to make a start ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

christinahedges commented 2 years ago

Review checklist for @christinahedges

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

catrionamurray commented 2 years ago

Review checklist for @catrionamurray

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

dfm commented 2 years ago

@catrionamurray, @christinahedges — A quick reminder to keep this on your radar! Let me know if you have any questions or issues as you go through your review. Thanks!!

christinahedges commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

catrionamurray commented 2 years ago

A question about the conflict of interest policy: I haven't directly worked with any of the authors on the Eureka! paper but I am a coauthor on two papers with Adina Feinstein (and potentially a few more people in the author list). I wasn't sure if this is an issue?

kevin218 commented 2 years ago

It's not a problem on our end. I think most potential reviewers will have worked with at least one person on the Eureka! team.

kevin218 commented 2 years ago

@catrionamurray, @christinahedges — What's your timeline for providing feedback? The first ERS paper will likely be ready within a week and it would be great to use the JOSS reference.

dfm commented 2 years ago

A question about the conflict of interest policy: I haven't directly worked with any of the authors on the Eureka! paper but I am a coauthor on two papers with Adina Feinstein (and potentially a few more people in the author list). I wasn't sure if this is an issue?

Thanks for bringing this up, @catrionamurray! I believe that this potential conflict is sufficiently minor that, as long as you're confident that you can provide a fair review, I'm happy to waive this apparent conflict of interest. If anyone involved feels that this in inappropriate, please feel free to comment here, or contact me directly via email (thanks @kevin218 for your confirmation already!). Otherwise, I think we should proceed with the review as planned. Thanks again!!

catrionamurray commented 2 years ago

I had an error installing Eureka! I've opened a new issue here: https://github.com/kevin218/Eureka/issues/421

catrionamurray commented 2 years ago

A question about the conflict of interest policy: I haven't directly worked with any of the authors on the Eureka! paper but I am a coauthor on two papers with Adina Feinstein (and potentially a few more people in the author list). I wasn't sure if this is an issue?

Thanks for bringing this up, @catrionamurray! I believe that this potential conflict is sufficiently minor that, as long as you're confident that you can provide a fair review, I'm happy to waive this apparent conflict of interest. If anyone involved feels that this in inappropriate, please feel free to comment here, or contact me directly via email (thanks @kevin218 for your confirmation already!). Otherwise, I think we should proceed with the review as planned. Thanks again!!

I'm happy I can provide a fair review!

catrionamurray commented 2 years ago

I found a potential mistake in the Quickstart docs, opened new issue here: https://github.com/kevin218/Eureka/issues/422

catrionamurray commented 2 years ago

There are missing references in the text to JWST, HST etc., issue here: https://github.com/kevin218/Eureka/issues/423

catrionamurray commented 2 years ago

Second issue with installation: https://github.com/kevin218/Eureka/issues/415

christinahedges commented 2 years ago

Hi all, my review is on-going as I have been having installation struggles with this tool. I'll keep working on it, but for now I've got this feedback:

Installation

Documentation

image

Several of the docstrings look like this, and so it’s not clear what these functions do. If these functions are not part of the stable release for this review, please can authors discuss here?

Example Usage

Tests

High Level Feedback

The Eureka tool works based on config files, and has multiple "stages" that are imported. The tool is run with scripts, as a pipeline. Because of this, the API can be quite hard to read to understand what’s going on, and for the user the tool can seem to be a bit of a black box, for example this is a workflow in the tests:

s2_meta = s2.calibrateJWST(meta.eventlabel, ecf_path=ecf_path)
s3_spec, s3_meta = s3.reduce(meta.eventlabel, ecf_path=ecf_path,
                                 s2_meta=s2_meta)
s4_spec, s4_lc, s4_meta = s4.genlc(meta.eventlabel, ecf_path=ecf_path,
                                       s3_meta=s3_meta)
s5.fitlc(meta.eventlabel, ecf_path=ecf_path, s4_meta=s4_meta)

It's not easily understandable to the reader what's happening in these steps. This is a common workflow in astronomy, and I am not advocating that you change any of these design decisions. However I wanted to highlight this, because I believe we need to ensure the reader has some information in the docs about what steps are happening inside these functions, and what methods are being used. This has lead to my feedback above.

I'll keep working on this review, but wanted to give some feedback as I go.

taylorbell57 commented 2 years ago

A quick response to a couple of your points - I'll try to address a couple of these today and the rest throughout the week.

  • I have had some installation issues. Given that some people don’t use conda, I tried to use a different environment/package manager. I had some of the same troubles people seem to be having installing this tool on an M1 Mac. I believe there are some M1 specific installation instructions that are going live on the docs, though these are to install via conda. I'm figuring out ways to get this to install.

Yeah, unfortunately none of our development team has an M1 Mac (Linux and Intel Macs only), so there's nothing we can do personally to test on M1 chips and we're left to relying on user-submitted issues which we've tried to resolve quickly. I think I and others avoided M1 chips for the time being knowing that a lot of software and packages would have issues with the new processor for a year or few. At this point, there are no known issues with our code itself on M1, but a couple of our dependencies have issues with M1 and there's little we can do ourselves about that. For several of the known installation issues, the problems can be resolved using the conda installation methods as some of the packages do seem to have been built with M1 knowledge on conda but not on pip.

  • The installation instructions point to a github repo that seems to be being updated at the moment with bug fixes and new functionality. Can authors highlight what version is their stable version for the purposes of this review?

Yeah, the code has been under pretty rapid development with the release of the first JWST observations. Things are stabilizing more with time though, so let's use the latest release from yesterday (v0.4; https://github.com/kevin218/Eureka/tree/v0.4; https://eurekadocs.readthedocs.io/en/v0.4/). The joss branch is now up-to-date with that v0.4 but also has the current paper draft which is not in the main branch. Note that changes to the paper draft to address your comments are only going to be present in the joss branch (which editorialbot uses here to build the paper), and for the docs/code issues you've raised the changes will be made on the main branch (https://github.com/kevin218/Eureka; https://eurekadocs.readthedocs.io/en/latest/) and occasionally released as stable versions.

  • There are API docs, but some of the API documentation is missing

Ah yeah, several of these are functions that could potentially be used some day for WFC3 analyses but are not used and aren't likely to be used anytime soon, so I'll just remove them from the repo shortly.

  • The API docs could benefit from some description of a demo workflow. As it stands, because the workflow is behind some scripts like run_eureka.py or functions e.g. calibrateJWST it’s not immediately clear to the reader what is happening under the hood, or what functions are being called in what order.

Agreed, we've discussed making flowcharts for each stage (https://github.com/kevin218/Eureka/issues/113), but we didn't want to start on that until we'd run some real JWST data through the pipeline as we knew things were subject to significant change. Now that things are settling down, making such flowcharts would certainly be beneficial.

  • Inside the documentation there’s not a discussion of the methods behind the functions (which is called out in the JOSS checklist). In the quickstart documentation there is a list of what is happening (e.g. ramp fitting) but not a detailed description of the methods/assumptions made in each step. It may be that this is not where the authors intend to put this information, in which case could they point me towards the right documentation about the methods?

This will take some more thinking on our end - I will get back to you. The flowcharts will help some, but not completely.

  • There is an example usage in the docs online, and example config files in the demos folder. It could be made more clear how to use this example. Thedemos folder would benefit from a readme to explain how to use this example.

Would a simple README.md which points the users to the quickstart tutorial be adequate for what you're thinking?

  • Authors might want to include the schematic from the paper inside their docs.

Ah, indeed - I guess we forgot to include that in the docs.

  • There are tests, and they are run automatically with CI. Looking at the tests, they are designed to import the relevant modules, run scripts on test data, and check that the resultant files are produced and that figures are produced. These tests do check that the tools run and produce an output, but don’t seem to check that the output is valid or correct. This does meet a minimum test that files are produced by the routines, but it does not thoroughly test the routines.

Agreed, we've discussed this too (in issue https://github.com/kevin218/Eureka/issues/140 and during video calls) - the hard part is spoofing data for each of the functions for which we know the expected result. We're unlikely to write such tests for Stage 2 (as there's basically no code in there) and writing them for Stage 1 would be very challenging (due to the large file sizes of the input uncal files for even ~3 integrations), but the functions in Stages 3-6 are good candidates for detailed unit testing. As it stands, the already existing tests have been very helpful as they test end-to-end and the code typically crashes somewhere along the way if something is off, but I totally agree that a lack of crashes does not imply the correct result has been produced.

... Because of this, the API can be quite hard to read to understand what’s going on, and for the user the tool can seem to be a bit of a black box ...

Agreed, and hopefully this will be significantly ameliorated with the flowcharts I described above. We've also talked about making it easier for the user to call each of the steps in the stages themselves to give a better understanding of the flow and procedure (and make customization easier)

taylorbell57 commented 2 years ago

Following up on the above, I just made a PR (https://github.com/kevin218/Eureka/pull/442) which adds to the documentation website some more flowcharts which describe the workflow for Eureka!'s Stages 3 and 4 (visible here for now https://eurekadocs.readthedocs.io/en/joss_docs/stages.html). I didn't bother to make flowcharts for Stages 5 and 6 as they're not that complicated or deeply nested. I added links to the documentation for Stages 1 and 2 made by the STScI as any edits we currently allow for Stage 1 are fully experimental.

That PR should also resolve the issues you pointed out about undocumented or poorly documented functions in the API.

Still unaddressed are comments:

  1. Add unit tests rather than just relying on end-to-end tests
  2. The demos folder would benefit from a readme to explain how to use this example. a. Please clarify if a README file with a link to the quickstart guide is all that is needed here
  3. Inside the documentation there’s not a discussion of the methods behind the functions a. Can you comment on how detailed you are thinking here? With so many functions, this could pretty quickly get out of hand.

Let me know if you feel that any of your other comments have not been addressed now (either with edits in https://github.com/kevin218/Eureka/pull/442 or in the discussion above).

christinahedges commented 2 years ago

Hi @taylorbell57

The new flow charts look good!

Please clarify if a README file with a link to the quickstart guide is all that is needed here

Yes a README is what's needed, with a brief explanation of what the demos are and a pointer to the quickstart sounds fine.

Can you comment on how detailed you are thinking here? With so many functions, this could pretty quickly get out of hand.

I think it would be worth having @dfm weigh in here. Currently, as a user, I think there's not a place that I can go to find the methods this pipeline is using under the hood. @dfm, to meet the documentation of methods here, what should the authors be aiming for?

dfm commented 2 years ago

I think that overall the state of the docs looks good, but I think your point is a good one @christinahedges. In many cases the docstrings do leave something to be desired in terms of narrative discussion of what the method is doing in. I wouldn't consider this a showstopper for the JOSS review, but I also think that the user experience could be substantially improved with some expansion of the docstrings for the key user-facing methods, or more narrative structure in the high-level summary pages.

In summary: from my perspective the minimum requirements are certainly satisfied, but I think that some effort in adding some detail to some key subset of docstrings would go a long way!

dfm commented 2 years ago

@catrionamurray, @christinahedges — I wanted to check in here on the status of your reviews. Please let us know where things stand, and what issues remain wrt your checklists! Thanks!!

catrionamurray commented 2 years ago

I'm on holiday at the moment but I will post an update on Monday when I'm back!

catrionamurray commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:warning: An error happened when generating the pdf.

editorialbot commented 2 years ago

:warning: An error happened when generating the pdf.

taylorbell57 commented 2 years ago

Oh my, it appears I broke something with the Bibtex or something. Let me try and fix that for you!

catrionamurray commented 2 years ago

Just rounding up my review, but I'm getting a compile error when trying to generate the pdf?

taylorbell57 commented 2 years ago

Aha, found the typo - the STScI had a missing } in the bibtex snippet they gave me. Should compile fine for you now!

catrionamurray commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

catrionamurray commented 2 years ago

Thanks Taylor!

From my end the minimum checklist has been completed, I just reran through the (M1 chip) installation and Quickstart using the v0.5 of Eureka! and I'm happy that it installs and runs as it should. A small note is that the Quickstart plots are probably from an older version, so the plots/MAD values etc. I got following the tutorial were slightly different. Not a big deal, but I thought it was worth mentioning.

The docs are also much improved since the start of this review process. But I definitely agree with @christinahedges points about having more informative method descriptions in the API section (or at least links to relevant docs) where you describe the inputs/outputs/methods for each step compared to the one-line summaries that are there currently (e.g. "Fit sky background with out-of-spectra data."). To your question, Taylor:

a. Can you comment on how detailed you are thinking here? With so many functions, this could pretty quickly get out of hand.

In my opinion having so many functions is even more reason to have detailed docs, or even a separate pdf that contains all the methods if it would make the webpage too cluttered. It's quite easy to get lost in the mountains of code without understanding what processes you're actually running. Also I second the point about adding tests that don't just check if the code ran without crashing (or basic quality checks, e.g. are the final values physical?). However, these changes may take a bit more time and thought and are probably beyond the scope of this JOSS review, as per Dan's comment.

Thanks for the work you've all put in, it's quite a mammoth task and I'm sure this will be (and already is) an invaluable tool for analyzing the JWST data!!

taylorbell57 commented 2 years ago

Amazing, I'm very glad to hear that it is now working on M1!! Good point on the quickstart figures - I'll try to remember to update those when I get the chance.

For the docs, how much of this has been addressed by the flowcharts found here. Are you suggesting a link to that page from the top of the API page or from within some key functions, or are you meaning that you want more of a text description of each function in the API? A detailed paragraph for every function honestly isn't going to be feasible (especially in the short term), but adding one to some of the higher level functions that users are most likely to directly interact with is probably doable. I can try to do this fairly soon, but please clarify whether this is something that needs to be done before JOSS acceptance.

For unit tests, this is definitely on our to-do list, but will take substantial amounts of time and I don't feel it is required at this point (although I agree it would be ideal to have them by now). We're just all at or near the limits of what we can take on with all this ERS, GO, and GTO data flooding in, and none of us are getting paid to work on Eureka so we're limited in how we can spend our time.

Thanks, it has been very rewarding seeing Eureka! used in each of the ERS papers that are in prep - exciting seeing all this teamwork paying off!

catrionamurray commented 2 years ago

I don't think the improvements on either the docs or the tests are necessary for JOSS acceptance (though maybe @dfm has more insight into this). However, I think it would be great for the user if there should be something along the lines of what exists for the JWST pipeline (https://jwst-pipeline.readthedocs.io/en/latest/jwst/background_step/description.html) at least for the major steps beyond Stage 2/differences from the JWST pipeline. Especially since this tool is designed to replace several stages in the JWST pipeline, which is very well documented.

I think:

adding one to some of the higher level functions that users are most likely to directly interact with is probably doable

Would be great!

As an example of what I'm referring to, in the Eureka overview docs I see descriptions like "Stage 3: Using Stage 2 outputs, performs background subtraction and optimal spectral extraction". It would be nice to have more detailed paragraph for what major methods like background subtraction and optimal spectral extraction mean. Say I was having some weird residuals in my light curves and I wanted to check/make changes to the background subtraction step. If I look at the flowchart I see:

Screen Shot 2022-09-20 at 9 52 15 AM

Then I go to the API to see what inst.fit_bg() does, but I can only (at least easily) find:

Screen Shot 2022-09-20 at 9 56 47 AM Screen Shot 2022-09-20 at 9 57 24 AM Screen Shot 2022-09-20 at 10 11 50 AM

Which doesn't really tell me what these stages are doing without digging into the code itself. I understand that it may not be feasible for all functions that Eureka! runs in the short-term but I think it would definitely improve the user experience to have more detail (e.g. the JWST Pipeline docs) for methods that can significantly affect the final results. Even if it was limited to brief descriptions of the 5-10 methods mentioned in the overview for now I think this would be a great step!

I hope that makes sense?

dfm commented 2 years ago

I don't think the improvements on either the docs or the tests are necessary for JOSS acceptance

Agreed! We want to provide constructive feedback to improve the usability and reliability of scientific software, but we don't want to be too hard line. I agree with @catrionamurray and @christinahedges that these documentation improvements should be high priority for the development team to increase the impact of Eureka, but a complete overhaul is not necessary for this submission as far as I'm concerned!

dfm commented 2 years ago

@editorialbot add @dfm as reviewer

@taylorbell57 — It unfortunately sounds like @christinahedges is overcommitted and won't be able to find the time to finish the review. Since we already have ~1.5 reviews, I'll take over and review myself to get the fastest turnaround. I probably won't be able to get to it this week because I'm running a conference, but I'll get onto it ASAP! Thanks all for your patience!

editorialbot commented 2 years ago

@dfm added to the reviewers list!

dfm commented 2 years ago

Review checklist for @dfm

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

dfm commented 2 years ago

@taylorbell57 — I've gone through the checklist and happily checked all but 1 of the boxes. You may have also noticed the issue and PR that I opened. After you take a look at those, I'd say that the only remaining issue here is the "State of the field" in the software paper section of the checklist. I know that some alternative tools are listed in the paper, but I think it would be great to provide a few more words of context about where Eureka fits within this ecosystem. Personally I think this should be in both the paper and the docs.

taylorbell57 commented 2 years ago

Awesome, thanks for your rapid review! I'll try to get that all adressed fairly soon here and will write back here once everything has been addressed or if I have any follow-up questions

taylorbell57 commented 2 years ago

Okay @dfm, I believe that all your quickstart related comments are addressed by PR https://github.com/kevin218/Eureka/pull/474 and your "State of the field" comment is addressed by PR https://github.com/kevin218/Eureka/pull/477.

dfm commented 2 years ago

Looks good to me! If I don't see when it happens, can you ping me when these are merged and I'll move onto the next step. Thanks!

taylorbell57 commented 2 years ago

@dfm, those two PRs have now been merged into the main branch

dfm commented 2 years ago

@editorialbot generate pdf

dfm commented 2 years ago

@editorialbot check references

editorialbot commented 2 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1007/s11214-006-8315-7 is OK
- 10.1086/682252 is OK
- 10.1117/12.552281 is OK
- 10.1117/12.735602 is OK
- 10.1117/12.2309161 is OK
- 10.1111/j.1749-6632.1986.tb47983.x is OK
- 10.1117/12.789581 is OK
- 10.1088/0004-637X/754/2/136 is OK
- 10.1088/0004-637X/768/1/42 is OK
- 10.1088/0004-6256/147/6/161 is OK
- 10.1038/s41550-017-0351-6 is OK
- 10.3847/2041-8213/aabcc8 is OK
- 10.1093/mnras/stab1027 is OK
- 10.1038/nature12888 is OK
- 10.5281/zenodo.1998447 is OK
- 10.1093/mnras/stz2688 is OK
- 10.5281/zenodo.6984366 is OK

MISSING DOIs

- None

INVALID DOIs

- None