openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[REVIEW]: STITCHES: a Python package to amalgamate existing Earth system model output into new scenario realizations #5525

Closed editorialbot closed 6 months ago

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@abigailsnyder<!--end-author-handle-- (Abigail Snyder) Repository: https://github.com/JGCRI/stitches Branch with paper.md (empty if default branch): main Version: v0.13 Editor: !--editor-->@observingClouds<!--end-editor-- Reviewers: @znicholls, @Zeitsperre Archive: 10.5281/zenodo.11094934

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/ad81e6a435c13ae644a7ca8cb0ffbc35"><img src="https://joss.theoj.org/papers/ad81e6a435c13ae644a7ca8cb0ffbc35/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/ad81e6a435c13ae644a7ca8cb0ffbc35/status.svg)](https://joss.theoj.org/papers/ad81e6a435c13ae644a7ca8cb0ffbc35)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@znicholls & @Zeitsperre, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @observingClouds know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @znicholls

📝 Checklist for @Zeitsperre

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 year ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.04 s (1004.0 files/s, 138666.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          23            675            957           1664
Jupyter Notebook                 2              0            786            406
reStructuredText                 5            290            308            260
Markdown                         7             62              0            223
TeX                              1              4              0             52
YAML                             2             10              4             45
DOS Batch                        1              8              1             26
make                             1              4              7              9
-------------------------------------------------------------------------------
SUM:                            42           1053           2063           2685
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 1 year ago

Wordcount for paper.md is 857

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.5194/esd-2022-14 is OK
- 10.1017/9781009157940.001 is OK
- 10.5194/gmd-9-1937-2016 is OK
- 10.5194/gmd-9-3461-2016 is OK
- 10.1038/nclimate3310 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

observingClouds commented 1 year ago

Welcome to the review process 🎉 @znicholls, @Zeitsperre please start your review by typing @editorialbot generate my checklist in a comment below.

znicholls commented 1 year ago

Review checklist for @znicholls

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

znicholls commented 1 year ago

Just an FYI, I probably won't get to this until the end of the month unfortunately

Zeitsperre commented 1 year ago

Hi there! I'll be taking a look at this very soon, hopefully next week. Thanks again for reaching out to me, @observingClouds.

Zeitsperre commented 1 year ago

Review checklist for @Zeitsperre

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Zeitsperre commented 1 year ago

@observingClouds @abigailsnyder

My review here is complete. I think that STITCHES does something quite unique that I haven't seen before, and can genuinely see the value of it in performing global-scale impact modelling analyses. Thanks again for asking me to review this software.

I think there are quite a few improvements that can be done (many of them are relatively low-effort) that I would gladly open as issues (or submit as fixes in Pull Requests) if they are welcome (and if Pull Requests from reviewers are allowed from reviewers). Please let me know.

observingClouds commented 1 year ago

Thanks @Zeitsperre for the update! Please open issues over in the STITCHES' repository. If it helps the discussion of the issue feel free to open a PR as well.

znicholls commented 1 year ago

@observingClouds re conflict of interest: I have published with both Kalyn and Claudia previously. Sorry I should have read that more closely earlier. I would request that the COI is simply noted here (and ideally waived) given that the community is relatively small and I think I can provide an impartial review nonetheless.

znicholls commented 1 year ago

I have also completed my review. In the process, I have made a number of issues (all cross-linked here) to follow up with areas where I haven't ticked the box above yet.

In general, I think I could use the package without too much trouble but I would really struggle to extend it beyond its main use case. The major reason is that many of the internal data is built around pandas data frames, but it wasn't clear to me where I should look to understand what sort of form these data frames should take or what the data in them should be. This may require a separate section on the various 'models' used in the code (e.g. https://pyam-iamc.readthedocs.io/en/stable/data.html). Such sections can be a bit painful to write and maintain, but they are generally very helpful for helping new users and maintainers to understand how the system works. In particular, it wasn't clear to me what the recipe should look like nor the archive data (I could sort of guess from the examples, but I am not sure I would guess correctly so explicit docs could be very helpful). The other option would be to use something like pandera to add more structure to the data frames and communicate more explicitly what each column should be and means. Each option comes with pro's and con's, but I think I would find it very hard to build a mental model of the package as it is currently written and documented.

Arising from the above, there were a few areas where the functionality of the package wasn't super clear to me. I think adding a section like the 'model' section above and/or refactoring would make it much clearer what is going on. The issues were:

In general, the repo would also benefit from the use of some other tools e.g. code linters and auto-formatters. They would make the code much more readable and introduce very little cost (at least in my opinon).

Zeitsperre commented 1 year ago

(Hi all, I've been meaning to open some issues/PRs in the repo (and will when I find a minute), but I wanted to piggyback on Zeb's great comments here)

In general, I think I could use the package without too much trouble but I would really struggle to extend it beyond its main use case. The major reason is that many of the internal data is built around pandas data frames, but it wasn't clear to me where I should look to understand what sort of form these data frames should take or what the data in them should be. This may require a separate section on the various 'models' used in the code (e.g. https://pyam-iamc.readthedocs.io/en/stable/data.html). Such sections can be a bit painful to write and maintain, but they are generally very helpful for helping new users and maintainers to understand how the system works.

I am very much in agreement with this. Having worked with CORDEX/CMIP5/CMIP6 data for many years, I also felt it wasn't entirely clear how I could format my local NetCDF collections for use in STITCHES. A data model would provide a first step towards implementing methods for generalizing use-cases.

In particular, it wasn't clear to me what the recipe should look like nor the archive data (I could sort of guess from the examples, but I am not sure I would guess correctly so explicit docs could be very helpful). The other option would be to use something like pandera to add more structure to the data frames and communicate more explicitly what each column should be and means. Each option comes with pro's and con's, but I think I would find it very hard to build a mental model of the package as it is currently written and documented.

I mentioned in my review that xarray and intake are used to fetch data by STITCHES and I had found it odd that efforts were done to convert their Dataset format to CSVs and Pandas DataFrames. The xarray format is quite robust/extensible (https://docs.xarray.dev/en/stable/user-guide/data-structures.html) and preserves the metadata fetched from Pangeo. With tools intake and dask, it allows for methods of structuring subsetted requests for data in advance of GET requests, significantly reducing download time/memory requirements. I'm not familiar with Pandera, but it seems to have integrated dask support as well. Adherence to a standardized data structure would be essential to making the project much more portable.

In general, the repo would also benefit from the use of some other tools e.g. code linters and auto-formatters. They would make the code much more readable and introduce very little cost (at least in my opinon).

In my work, I do a fair amount of package maintenance and coding standards enforcement, and would be willing to give a hand in this way. If my schedule allows for it, I'd be more than happy to open a PR.

observingClouds commented 1 year ago

Thank you @Zeitsperre and @znicholls for your review and starting the discussion on some action-items. Based on the agreement between your reviews, I suggest @abigailsnyder and their colleagues to already go ahead and address these common issues/suggestions.

observingClouds commented 1 year ago

@znicholls, regarding your potential COI, could you let me know in which aspect you fall under the JOSS COI policy? Thank you!

znicholls commented 1 year ago

could you let me know in which aspect you fall under the JOSS COI policy?

In which I aspect I fall under or fail under? I have published papers with both Kalyn and Claudia in the last two years. I am also on a scientific steering committee with Claudia and plan to publish with Kalyn again in the next 12 months.

abigailsnyder commented 1 year ago

Thanks very much to the reviewers for their feedback, I look forward to addressing it! I see you have both opened issues and/or PRs, but please feel free to open any additional ones.

We might be slightly slower addressing these than is typical for our group due to various folks being out on vacations at different points.

Thank you so much again!

observingClouds commented 1 year ago

could you let me know in which aspect you fall under the JOSS COI policy?

In which I aspect I fall under or fail under? I have published papers with both Kalyn and Claudia in the last two years. I am also on a scientific steering committee with Claudia and plan to publish with Kalyn again in the next 12 months.

Thanks @znicholls for giving me additional detail about your COI. Upon consultation with the JOSS team, we will waive the COI for this submission. As you mentioned already, for the future it would be great to know about any potential COI earlier. Maybe we also need to adjust the workflow on our side and ask for the COI before the start of the review process.

observingClouds commented 1 year ago

@abigailsnyder just checking back if the comments and issues could already be addressed. Please ping me here if you have done so. Thank you.

abigailsnyder commented 1 year ago

Thank you for checking in! I have time blocked out this week and next to address review comments on this

abigailsnyder commented 1 year ago

@observingClouds We are working on this still (and juggling summer schedules) and are hoping to have it ready to go by end of next week. Thank you for your patience! We will likely do a full R2R all in one place when it is ready.

observingClouds commented 1 year ago

Hi @abigailsnyder, I just want to check back and see how you are progressing. Please let me know if you have any questions. Cheers, Hauke

kdorheim commented 1 year ago

Hi @observingClouds, @abigailsnyder is currently out of the office we are waiting for another coauthor to respond, but hope to resolve the outstanding issues as quickly as possible. Thanks!

kthyng commented 1 year ago

@kdorheim @abigailsnyder What is the timeframe for this response? We'd like to label this submission as "paused" if it will be more than another week, and we would request a withdraw and resubmission if it will be more than a few weeks given the already extended wait here. (we can't keep the reviewers on the hook for too long...)

abigailsnyder commented 1 year ago

@kthyng I will check with the co-author whose time has been a challenge and get back with you. Thank you

abigailsnyder commented 1 year ago

@kthyng my co-author thinks working on this tomorrow and getting the remaining parts done in a week is feasible. The main challenge in their availability has been family caregiving, so it's a little hard to predict. We have addressed most of the points and I'm happy to post the in-progress R2R document, if that's helpful. Or I'm more than happy to enter this submission into a paused phase if that would work better.

I appreciate the journal's patience and willingness to work with us during a period where more than a few of us have had big life things coming up.

kthyng commented 1 year ago

@abigailsnyder I am very sympathetic to the realities of life! Let's see if we can pause this for 1 month while your team works in the background.

It sounds like the authors have some steps to do on their end that are taking some extra time, but if the reviewers @znicholls, @Zeitsperre have any steps to do on their end that don't overlap, please consider doing so.

Separately, @znicholls, @Zeitsperre are you ok with a 1 month pause on this submission after which you would be requested to finish reviewing this submission (after any items you might be able to do now)?

Zeitsperre commented 1 year ago

All good for me. Life happens.

Some of the comments we made require some significant changes, so I completely understand the need to not rush the implementation.

I had been holding off on opening PRs as the changes might be significant (mainly formatting). If things are on hold, this might be a good time to propose them. Will open a PR if I have some time next week.

znicholls commented 1 year ago

Yep fine for me too (I don't have any outstanding things to do so this is just sitting in the background for me)

kthyng commented 1 year ago

Ok I will "pause" this submission and let's get back together on here by the end of November.

kthyng commented 1 year ago

Ok we are about at the time of our original noted timing. @abigailsnyder How are things looking on the author side of things? @Zeitsperre will you be able to open an issue for the comments you had?

Zeitsperre commented 1 year ago

@kthyng Yes, can do in the next day or so!

abigailsnyder commented 1 year ago

@kthyng - I checked with my co-author today and he is confident he can complete the remaining pieces this week. Thanks

kthyng commented 1 year ago

ok thanks all! I'll remove the "paused" label.

observingClouds commented 11 months ago

Hi @abigailsnyder, Now that we are "back in business" I want to check how things are progressing. I still see some open issues in the repository. Maybe you can give us a short update on your plans? Thank you!

abigailsnyder commented 11 months ago

@observingClouds We are reconvening Monday postAGU to get things finished off

abigailsnyder commented 11 months ago

@observingClouds We have addressed all comments and performed a new release of stitches (https://github.com/JGCRI/stitches/releases/tag/v0.11.0). All changes in response to the review here are now on main. Attached here is a PDF of our R2R. We pulled every comment we could find across this review, issues and PRs opened in the stitches repo by reviewers into one place to organize and address.

We thank the reviewers and editors for their thorough comments and patience as we have navigated this revision. As it is about to be Winter Holidays, we may be slow to respond in the coming weeks. Hopefully the reviewers and editors will be as well, enjoying a break. We look forward to addressing any additional comments or new reviews in the new year.

stitches_joss_r2r.pdf

observingClouds commented 10 months ago

@editorialbot generate pdf

editorialbot commented 10 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

observingClouds commented 10 months ago

Dear @abigailsnyder, Thank you for your response to the reviewers' comments. You mentioned a few changes you made to the manuscript, e.g. referencing the MESMER tool, however, I currently cannot find these changes in paper.md. Could you please confirm that those changes are pushed to GitHub? As soon as those changes are visible, I'll notify the reviewers to take one last look at your responses and changes.

Thank you.

Cheers, Hauke

abigailsnyder commented 10 months ago

@observingClouds Thank you for catching this! It was one of our earlier changes we made and it got stranded on a branch. It has been merged into main and can be seen at https://github.com/JGCRI/stitches/blob/main/paper/paper.md now Thanks again! Abigail

observingClouds commented 10 months ago

@editorialbot generate pdf

editorialbot commented 10 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

observingClouds commented 10 months ago

Thanks @abigailsnyder for merging the changes.

observingClouds commented 10 months ago

Dear @Zeitsperre, Dear @znicholls, Thank you very much for your review and also sharing your expertise with e.g., setting up the linters and opening some pull requests. Now that your comments have been responded too, I like you to have a last check if everything has been addressed accordingly. In particular, I like you to:

Hauke

znicholls commented 10 months ago

Thanks Hauke, on the to-do list for next week

znicholls commented 10 months ago

@observingClouds I've gone through things again. I think this is now good enough.

My only hesitation was on the functionality question. A key step seems to be pre-processing of CMIP6. In my opinion, this still isn't documented in any real sense. I note that you could argue that this pre-processing is not part of the package, but in order to use it this pre-processed data is key so I would be more comfortable if it were included.

I would also note that the only installation option is still from source. That's fine, it works, but it is a bit odd for a Python package where the barrier to releasing on pypi is so low.

My other comments are in https://github.com/JGCRI/stitches/issues/78, they are not review blocking though.

observingClouds commented 10 months ago

Thanks for your feedback @znicholls. This is very much appreciated.