Closed editorialbot closed 6 months ago
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.88 T=0.04 s (1004.0 files/s, 138666.7 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 23 675 957 1664
Jupyter Notebook 2 0 786 406
reStructuredText 5 290 308 260
Markdown 7 62 0 223
TeX 1 4 0 52
YAML 2 10 4 45
DOS Batch 1 8 1 26
make 1 4 7 9
-------------------------------------------------------------------------------
SUM: 42 1053 2063 2685
-------------------------------------------------------------------------------
gitinspector failed to run statistical information for the repository
Wordcount for paper.md
is 857
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.5194/esd-2022-14 is OK
- 10.1017/9781009157940.001 is OK
- 10.5194/gmd-9-1937-2016 is OK
- 10.5194/gmd-9-3461-2016 is OK
- 10.1038/nclimate3310 is OK
MISSING DOIs
- None
INVALID DOIs
- None
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Welcome to the review process 🎉
@znicholls, @Zeitsperre please start your review by typing @editorialbot generate my checklist
in a comment below.
Just an FYI, I probably won't get to this until the end of the month unfortunately
Hi there! I'll be taking a look at this very soon, hopefully next week. Thanks again for reaching out to me, @observingClouds.
stitches.make_tas_archive
) as fetch calls to pangeo are not "lazy" (estimated ~8h to collect data values on a reasonably fast connection; memory requirements are probably significant).requirements.txt
. Requirements pin pandas<1.5
(should be updated) and use pkg_resources
(deprecated).setup.py
or repository.setup()
doesn't adhere to PEP standards, no use of package metadata classifiers (https://pypi.org/classifiers/).stitches.tests
) but should be moved to top-level / excluded from wheel.$ python setup.py install
).python -c 'import stitches; stitches.install_package_data()'
) but this should be made part of the testing setup stage or exposed via a CLI.actions/checkout@v1
). Running tests locally shows that they pass (unittest or pytest) but shows many DeprecationWarnings
.CONTRIBUTING.md
guide specific for the project.@observingClouds @abigailsnyder
My review here is complete. I think that STITCHES does something quite unique that I haven't seen before, and can genuinely see the value of it in performing global-scale impact modelling analyses. Thanks again for asking me to review this software.
I think there are quite a few improvements that can be done (many of them are relatively low-effort) that I would gladly open as issues (or submit as fixes in Pull Requests) if they are welcome (and if Pull Requests from reviewers are allowed from reviewers). Please let me know.
Thanks @Zeitsperre for the update! Please open issues over in the STITCHES' repository. If it helps the discussion of the issue feel free to open a PR as well.
@observingClouds re conflict of interest: I have published with both Kalyn and Claudia previously. Sorry I should have read that more closely earlier. I would request that the COI is simply noted here (and ideally waived) given that the community is relatively small and I think I can provide an impartial review nonetheless.
I have also completed my review. In the process, I have made a number of issues (all cross-linked here) to follow up with areas where I haven't ticked the box above yet.
In general, I think I could use the package without too much trouble but I would really struggle to extend it beyond its main use case. The major reason is that many of the internal data is built around pandas data frames, but it wasn't clear to me where I should look to understand what sort of form these data frames should take or what the data in them should be. This may require a separate section on the various 'models' used in the code (e.g. https://pyam-iamc.readthedocs.io/en/stable/data.html). Such sections can be a bit painful to write and maintain, but they are generally very helpful for helping new users and maintainers to understand how the system works. In particular, it wasn't clear to me what the recipe should look like nor the archive data (I could sort of guess from the examples, but I am not sure I would guess correctly so explicit docs could be very helpful). The other option would be to use something like pandera to add more structure to the data frames and communicate more explicitly what each column should be and means. Each option comes with pro's and con's, but I think I would find it very hard to build a mental model of the package as it is currently written and documented.
Arising from the above, there were a few areas where the functionality of the package wasn't super clear to me. I think adding a section like the 'model' section above and/or refactoring would make it much clearer what is going on. The issues were:
why are 'ensemble', 'experiment' and 'model' needed as part of the target_data
? Could this not work just based on grouping by everything except the key columns of variable, year and value? In addition, shouldn't there be some units in target_data
and perhaps also a reference period? Using a different reference period from that which was used to create the archive in the first place could cause havoc no?
Is the pangeo table hard-coded? This is probably fine for CMIP6, but seems problematic as we rapidly move beyond CMIP6 (perhaps this can be left for future work, but it seemed a bit odd to me to not give users a way to use a different archive if they want)
Is the historical/scenario split also intentionally hard-coded to know about the SSPs? This also seems like it could be problematic if anyone wanted to use STITCHES in a different context or after CMIP6 (or before, e.g. CMIP5).
Does the stitching only pull data from one model or can you end up with stitching that joins together windows/samples from multiple different models (e.g. CanESM5 and NASA-GISS output)? Reading the diagram, I think the answer is you can have more than one model but looking at the code, it wasn't super clear to me (e.g. this line csv_to_load = [file for file in all_files if (model in file)][0]
in gmat_stitching
confused me, maybe the stitching_id
handles this?)
In general, the repo would also benefit from the use of some other tools e.g. code linters and auto-formatters. They would make the code much more readable and introduce very little cost (at least in my opinon).
(Hi all, I've been meaning to open some issues/PRs in the repo (and will when I find a minute), but I wanted to piggyback on Zeb's great comments here)
In general, I think I could use the package without too much trouble but I would really struggle to extend it beyond its main use case. The major reason is that many of the internal data is built around pandas data frames, but it wasn't clear to me where I should look to understand what sort of form these data frames should take or what the data in them should be. This may require a separate section on the various 'models' used in the code (e.g. https://pyam-iamc.readthedocs.io/en/stable/data.html). Such sections can be a bit painful to write and maintain, but they are generally very helpful for helping new users and maintainers to understand how the system works.
I am very much in agreement with this. Having worked with CORDEX/CMIP5/CMIP6 data for many years, I also felt it wasn't entirely clear how I could format my local NetCDF collections for use in STITCHES
. A data model would provide a first step towards implementing methods for generalizing use-cases.
In particular, it wasn't clear to me what the recipe should look like nor the archive data (I could sort of guess from the examples, but I am not sure I would guess correctly so explicit docs could be very helpful). The other option would be to use something like pandera to add more structure to the data frames and communicate more explicitly what each column should be and means. Each option comes with pro's and con's, but I think I would find it very hard to build a mental model of the package as it is currently written and documented.
I mentioned in my review that xarray
and intake
are used to fetch data by STITCHES
and I had found it odd that efforts were done to convert their Dataset
format to CSVs and Pandas DataFrames
. The xarray
format is quite robust/extensible (https://docs.xarray.dev/en/stable/user-guide/data-structures.html) and preserves the metadata fetched from Pangeo. With tools intake
and dask
, it allows for methods of structuring subsetted requests for data in advance of GET
requests, significantly reducing download time/memory requirements. I'm not familiar with Pandera, but it seems to have integrated dask
support as well. Adherence to a standardized data structure would be essential to making the project much more portable.
In general, the repo would also benefit from the use of some other tools e.g. code linters and auto-formatters. They would make the code much more readable and introduce very little cost (at least in my opinon).
In my work, I do a fair amount of package maintenance and coding standards enforcement, and would be willing to give a hand in this way. If my schedule allows for it, I'd be more than happy to open a PR.
Thank you @Zeitsperre and @znicholls for your review and starting the discussion on some action-items. Based on the agreement between your reviews, I suggest @abigailsnyder and their colleagues to already go ahead and address these common issues/suggestions.
@znicholls, regarding your potential COI, could you let me know in which aspect you fall under the JOSS COI policy? Thank you!
could you let me know in which aspect you fall under the JOSS COI policy?
In which I aspect I fall under or fail under? I have published papers with both Kalyn and Claudia in the last two years. I am also on a scientific steering committee with Claudia and plan to publish with Kalyn again in the next 12 months.
Thanks very much to the reviewers for their feedback, I look forward to addressing it! I see you have both opened issues and/or PRs, but please feel free to open any additional ones.
We might be slightly slower addressing these than is typical for our group due to various folks being out on vacations at different points.
Thank you so much again!
could you let me know in which aspect you fall under the JOSS COI policy?
In which I aspect I fall under or fail under? I have published papers with both Kalyn and Claudia in the last two years. I am also on a scientific steering committee with Claudia and plan to publish with Kalyn again in the next 12 months.
Thanks @znicholls for giving me additional detail about your COI. Upon consultation with the JOSS team, we will waive the COI for this submission. As you mentioned already, for the future it would be great to know about any potential COI earlier. Maybe we also need to adjust the workflow on our side and ask for the COI before the start of the review process.
@abigailsnyder just checking back if the comments and issues could already be addressed. Please ping me here if you have done so. Thank you.
Thank you for checking in! I have time blocked out this week and next to address review comments on this
@observingClouds We are working on this still (and juggling summer schedules) and are hoping to have it ready to go by end of next week. Thank you for your patience! We will likely do a full R2R all in one place when it is ready.
Hi @abigailsnyder, I just want to check back and see how you are progressing. Please let me know if you have any questions. Cheers, Hauke
Hi @observingClouds, @abigailsnyder is currently out of the office we are waiting for another coauthor to respond, but hope to resolve the outstanding issues as quickly as possible. Thanks!
@kdorheim @abigailsnyder What is the timeframe for this response? We'd like to label this submission as "paused" if it will be more than another week, and we would request a withdraw and resubmission if it will be more than a few weeks given the already extended wait here. (we can't keep the reviewers on the hook for too long...)
@kthyng I will check with the co-author whose time has been a challenge and get back with you. Thank you
@kthyng my co-author thinks working on this tomorrow and getting the remaining parts done in a week is feasible. The main challenge in their availability has been family caregiving, so it's a little hard to predict. We have addressed most of the points and I'm happy to post the in-progress R2R document, if that's helpful. Or I'm more than happy to enter this submission into a paused phase if that would work better.
I appreciate the journal's patience and willingness to work with us during a period where more than a few of us have had big life things coming up.
@abigailsnyder I am very sympathetic to the realities of life! Let's see if we can pause this for 1 month while your team works in the background.
It sounds like the authors have some steps to do on their end that are taking some extra time, but if the reviewers @znicholls, @Zeitsperre have any steps to do on their end that don't overlap, please consider doing so.
Separately, @znicholls, @Zeitsperre are you ok with a 1 month pause on this submission after which you would be requested to finish reviewing this submission (after any items you might be able to do now)?
All good for me. Life happens.
Some of the comments we made require some significant changes, so I completely understand the need to not rush the implementation.
I had been holding off on opening PRs as the changes might be significant (mainly formatting). If things are on hold, this might be a good time to propose them. Will open a PR if I have some time next week.
Yep fine for me too (I don't have any outstanding things to do so this is just sitting in the background for me)
Ok I will "pause" this submission and let's get back together on here by the end of November.
Ok we are about at the time of our original noted timing. @abigailsnyder How are things looking on the author side of things? @Zeitsperre will you be able to open an issue for the comments you had?
@kthyng Yes, can do in the next day or so!
@kthyng - I checked with my co-author today and he is confident he can complete the remaining pieces this week. Thanks
ok thanks all! I'll remove the "paused" label.
Hi @abigailsnyder, Now that we are "back in business" I want to check how things are progressing. I still see some open issues in the repository. Maybe you can give us a short update on your plans? Thank you!
@observingClouds We are reconvening Monday postAGU to get things finished off
@observingClouds We have addressed all comments and performed a new release of stitches
(https://github.com/JGCRI/stitches/releases/tag/v0.11.0). All changes in response to the review here are now on main
.
Attached here is a PDF of our R2R. We pulled every comment we could find across this review, issues and PRs opened in the stitches
repo by reviewers into one place to organize and address.
We thank the reviewers and editors for their thorough comments and patience as we have navigated this revision. As it is about to be Winter Holidays, we may be slow to respond in the coming weeks. Hopefully the reviewers and editors will be as well, enjoying a break. We look forward to addressing any additional comments or new reviews in the new year.
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Dear @abigailsnyder, Thank you for your response to the reviewers' comments. You mentioned a few changes you made to the manuscript, e.g. referencing the MESMER tool, however, I currently cannot find these changes in paper.md. Could you please confirm that those changes are pushed to GitHub? As soon as those changes are visible, I'll notify the reviewers to take one last look at your responses and changes.
Thank you.
Cheers, Hauke
@observingClouds Thank you for catching this! It was one of our earlier changes we made and it got stranded on a branch. It has been merged into main and can be seen at https://github.com/JGCRI/stitches/blob/main/paper/paper.md now Thanks again! Abigail
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Thanks @abigailsnyder for merging the changes.
Dear @Zeitsperre, Dear @znicholls, Thank you very much for your review and also sharing your expertise with e.g., setting up the linters and opening some pull requests. Now that your comments have been responded too, I like you to have a last check if everything has been addressed accordingly. In particular, I like you to:
Hauke
Thanks Hauke, on the to-do list for next week
@observingClouds I've gone through things again. I think this is now good enough.
My only hesitation was on the functionality question. A key step seems to be pre-processing of CMIP6. In my opinion, this still isn't documented in any real sense. I note that you could argue that this pre-processing is not part of the package, but in order to use it this pre-processed data is key so I would be more comfortable if it were included.
I would also note that the only installation option is still from source. That's fine, it works, but it is a bit odd for a Python package where the barrier to releasing on pypi is so low.
My other comments are in https://github.com/JGCRI/stitches/issues/78, they are not review blocking though.
Thanks for your feedback @znicholls. This is very much appreciated.
Submitting author: !--author-handle-->@abigailsnyder<!--end-author-handle-- (Abigail Snyder) Repository: https://github.com/JGCRI/stitches Branch with paper.md (empty if default branch): main Version: v0.13 Editor: !--editor-->@observingClouds<!--end-editor-- Reviewers: @znicholls, @Zeitsperre Archive: 10.5281/zenodo.11094934
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@znicholls & @Zeitsperre, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @observingClouds know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Checklists
📝 Checklist for @znicholls
📝 Checklist for @Zeitsperre