[REVIEW]: xCDAT: A Python package for simple climate data analysis on structured grids

editorialbot commented 8 months ago

Submitting author: !--author-handle-->@tomvothecoder@arfon<!--end-editor-- Reviewers: @brian-rose, @mgrover1 Archive: 10.5281/zenodo.12522560

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/aeeaf960da55c95d4b0d6b690442578a"><img src="https://joss.theoj.org/papers/aeeaf960da55c95d4b0d6b690442578a/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/aeeaf960da55c95d4b0d6b690442578a/status.svg)](https://joss.theoj.org/papers/aeeaf960da55c95d4b0d6b690442578a)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@brian-rose & @mgrover1, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @mgrover1

📝 Checklist for @brian-rose

editorialbot commented 8 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 8 months ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.24 s (318.5 files/s, 231422.3 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          26           2601           3711          11193
reStructuredText                17            663            324           1476
Jupyter Notebook                 9              0          34843            548
YAML                            12             52             45            454
TeX                              1             15              0            380
Markdown                         2             40              0             91
make                             2             21             16             65
TOML                             2             10             22             56
DOS Batch                        1              8              1             27
SVG                              6              0              1             20
-------------------------------------------------------------------------------
SUM:                            78           3410          38963          14310
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 8 months ago

Wordcount for paper.md is 1637

editorialbot commented 8 months ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.5194/gmd-10-4619-2017 is OK
- 10.5334/jors.148 is OK
- 10.5194/egusphere-2023-2720 is OK
- 10.1073/pnas.2209431119 is OK
- 10.1109/ICDMW.2009.64 is OK
- 10.1002/2014EO420002 is OK
- 10.5194/gmd-15-9031-2022 is OK
- 10.5281/zenodo.2586088 is OK
- 10.5281/zenodo.10038784 is OK
- 10.5281/zenodo.8339034 is OK
- 10.5281/zenodo.10236521 is OK
- 10.5281/zenodo.8356796 is OK
- 10.5281/zenodo.7348619 is OK

MISSING DOIs

- None

INVALID DOIs

- https://doi.org/10.1029/2022MS003156 is INVALID because of 'https://doi.org/' prefix

editorialbot commented 8 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

arfon commented 8 months ago

@tomvothecoder, @brian-rose, @mgrover1 – This is the review thread for the paper. All of our communications will happen here from now on.

Please read the "Reviewer instructions & questions" in the first comment above. Please create your checklist typing:

@editorialbot generate my checklist

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/6426 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

arfon commented 7 months ago

@brian-rose, @mgrover1 – I don't see any progress yet from either of you on your reviews. Is there anything I can do to help you get going here?

mgrover1 commented 7 months ago

I am taking a look this week - thanks for the reminder @arfon

mgrover1 commented 7 months ago

Review checklist for @mgrover1

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/xCDAT/xcdat?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@tomvothecoder) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

mgrover1 commented 7 months ago

@tomvothecoder - I went through with a first cut. I am struggling to reproduce the benchmarks you mention due to trying to run on an M1 machine, I plan to test with my intel Mac later this week.

Most of the comments are tracked in the associated the issues. I recently added the DOI linking issue as well. Here is a list of the related issues to close:

The writing is well done, and the documentation is fantastic - just a couple of environment issues that likely would be fixed with some additional CI/cross architecture tests.

tomvothecoder commented 7 months ago

@mgrover1 Thanks a lot for you review so far! I'm addressing those GitHub issues ASAP and will let you know once they are all resolved.

tomvothecoder commented 7 months ago

@mgrover1 All of the issues listed in your comment above should now be resolved!

mgrover1 commented 7 months ago

Round 2 of reviews

Thanks for addressing all of the previous comments!

@tomvothecoder - is there anyway to make the datasets mentioned in the benchmarks more accessible? I understand one of the files is 105 GB, but I think it would help with reproducibility if there were someway to download these locally and execute the validation scripts you all put together.

[x] https://github.com/xCDAT/xcdat-validation/issues/52

tomvothecoder commented 7 months ago

@tomvothecoder - is there anyway to make the datasets mentioned in the benchmarks more accessible? I understand one of the files is 105 GB, but I think it would help with reproducibility if there were someway to download these locally and execute the validation scripts you all put together.

@mgrover1 Good point! I'll check to see if those datasets are available on ESGF and Globus. I'll also update the instructions for running the performance benchmark script to make it easier to reproduce the results.

mgrover1 commented 7 months ago

Great!! Thanks!!

arfon commented 7 months ago

@brian-rose – just checking in again here. We have one complete review from @mgrover1 at this point and would love to have yours completed in the next couple of weeks. What do you think?

brian-rose commented 7 months ago

Hi @arfon yes I will get this done soon. Thanks for the nudge!

mgrover1 commented 7 months ago

The revised instructions work well - thanks @tomvothecoder for making those changes!

tomvothecoder commented 7 months ago

@mgrover1 Awesome, thanks Max!

arfon commented 7 months ago

:wave: @brian-rose – just checking again here 😄

arfon commented 6 months ago

Quick update: Just dropped Brian an email directly. If we don't hear back in the next week I think we'll have to find an alternative second reviewer @tomvothecoder.

tomvothecoder commented 6 months ago

@arfon Thanks for the update Arfon.

brian-rose commented 6 months ago

Reaffirming my intent to complete this review! Sorry for the trouble. I've just gotten buried with service tasks during the semester and now starting to dig my way out again.

brian-rose commented 6 months ago

Review checklist for @brian-rose

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/xCDAT/xcdat?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@tomvothecoder) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[ ] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

brian-rose commented 6 months ago

@tomvothecoder this is a minor annoyance, but the author for the reference to Dask is showing up in the rendered paper as "Team, D. D.". Is there some way to override the parsing of the bib file entry for Dask so that the citation becomes more readable?

tomvothecoder commented 6 months ago

@tomvothecoder this is a minor annoyance, but the author for the reference to Dask is showing up in the rendered paper as "Team, D. D.". Is there some way to override the parsing of the bib file entry for Dask so that the citation becomes more readable?

I'll try to see if there is a way to make that citation render in a readable format (maybe I'll try dashes). I used the same Dask citation from the xclim paper and it also pops up as "Team, D. D." on there.

tomvothecoder commented 6 months ago

@tomvothecoder this is a minor annoyance, but the author for the reference to Dask is showing up in the rendered paper as "Team, D. D.". Is there some way to override the parsing of the bib file entry for Dask so that the citation becomes more readable?

I'll try to see if there is a way to make that citation render in a readable format (maybe I'll try dashes). I used the same Dask citation from the xclim paper and it also pops up as "Team, D. D." on there.

I updated the Dask author citation to "Dask-Development-Team" here.

brian-rose commented 5 months ago

@editorialbot generate pdf

editorialbot commented 5 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

brian-rose commented 5 months ago

@tomvothecoder this is a minor annoyance, but the author for the reference to Dask is showing up in the rendered paper as "Team, D. D.". Is there some way to override the parsing of the bib file entry for Dask so that the citation becomes more readable?

I'll try to see if there is a way to make that citation render in a readable format (maybe I'll try dashes). I used the same Dask citation from the xclim paper and it also pops up as "Team, D. D." on there.

I updated the Dask author citation to "Dask-Development-Team" here.

Great! I think this looks better.

Sorry for the continued delay. I've been trying to reproduce the benchmark calculation but hit some snags there, and also got interrupted by an unforeseen family issue last week.

brian-rose commented 5 months ago

I've encountered some issues that have so far prevented me from reproducing the benchmark calculation that appears in the article.

I've tried to follow the steps outlined on this page.

First, I was unable to create the necessary mamba environment on my Mac laptop. I commented on that here: https://github.com/xCDAT/xcdat-validation/issues/51

I got a bit farther on a linux server. But I encountered an error running one of the python scripts, which I documented here: https://github.com/xCDAT/xcdat-validation/issues/56

tomvothecoder commented 5 months ago

I've encountered some issues that have so far prevented me from reproducing the benchmark calculation that appears in the article.

I've tried to follow the steps outlined on this page.

First, I was unable to create the necessary mamba environment on my Mac laptop. I commented on that here: xCDAT/xcdat-validation#51

I got a bit farther on a linux server. But I encountered an error running one of the python scripts, which I documented here: xCDAT/xcdat-validation#56

Thanks for reporting these issues @brian-rose. I'll take a look at these today and will aim to find solutions ASAP!

brian-rose commented 5 months ago

@arfon a quick update on my review: my attempt to reproduce the benchmark calculation in the manuscript took me on a long detour.

With help from the author team (https://github.com/xCDAT/xcdat-validation/issues/56) I made some progress but ultimately concluded that I cannot reproduce the calculation because of the very large memory requirements of the benchmark script. With help from my IT support, I ran the script on the largest memory machine that I have access to on my campus (768 GB) but it was not enough.

So, I will not be able to check off the "Reproducibility" box. I've suggested that the authors update the documentation for their benchmark scripts to clarify the large memory requirement. In my opinion, that should be sufficient for publication.

I will finally finish working through functionality and documentation to complete my review.

tomvothecoder commented 5 months ago

Hi @brian-rose, just checking to see if you need anything for the functionality and documentation portion of your review?

brian-rose commented 5 months ago

@tomvothecoder and @arfon:

I have finished going through all the documentation and tested out the functionality. Specifically, these are some things I went through:

Read through the whole Getting Started section of the docs
Installed the package locally, following the installation instructions
Read through and executed all the example notebooks in the gallery
Skimmed through the API Reference
Read the Contributing Guide
Forked and cloned the source repository
Installed the development environment following these instructions
Ran the automated test suite with make test

Overall the docs are high quality. Almost every I looked at was clear and clean and easy to understand. This was a really valuable exercise for me, because I've learned that xCDAT is software that will really help me and my group do a lot routine data processing more consistently and expressively. Great package and great documentation!

The JOSS paper is also very well-written and easy to follow. It offers a clear and compelling statement of need. One of the central needs is easy access to parallelization for the out-of-core that are common in "big data" climate analysis. The benchmark calculations in Figure 1 of the paper illustrate this vividly. Ironically (and as noted above) I wasn't able to reproduce these calculations because I don't have access to a machine with large enough memory to handle the serial cases. Parallel is clearly the way to go.

My only hang-up right now is https://github.com/xCDAT/xcdat/issues/661, which is the reason I've left the "Example usage" box unchecked. I suggest that the team prioritize resolving this, because it's the very first line of xCDAT code in the very first tutorial in the docs which is failing.

tomvothecoder commented 5 months ago

Thank you @brian-rose! I appreciate your diligent review and all of your helpful comments. I'm glad to hear that you had an overall great experience reviewing xCDAT :)

This was a really valuable exercise for me, because I've learned that xCDAT is software that will really help me and my group do a lot routine data processing more consistently and expressively. Great package and great documentation!

If you ever need anything related to xCDAT, let me or Steve (@pochedls) know. We'd love to help you and your group utilize xCDAT in your data processing work wherever possible.

My only hang-up right now is xCDAT/xcdat#661, which is the reason I've left the "Example usage" box unchecked. I suggest that the team prioritize resolving this, because it's the very first line of xCDAT code in the very first tutorial in the docs which is failing.

I just addressed this issue and commented about it here. You should be able to re-run the notebook with this latest commit on main. Let me know if it works for you! That should take care of the "Example usage" checkbox.

brian-rose commented 4 months ago

Confirming here that the updated notebook works as expected!

brian-rose commented 4 months ago

I checked off the "Example usage" box, and I enthusiastically recommend this paper for publication in JOSS.

tomvothecoder commented 4 months ago

Hi @arfon, is anything else needed for the review of this paper?

arfon commented 4 months ago

@brian-rose – thanks for getting your review in 🙏

@tomvothecoder – looks like we're very close to being done here. I will circle back here next week, but in the meantime, please give your own paper a final read to check for any potential typos etc.

After that, could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:

The title of the archive is the same as the JOSS paper title
That the authors of the archive are the same as the JOSS paper authors
I can then move forward with accepting the submission.

tomvothecoder commented 4 months ago

Hi @arfon, I completed your final checklist your comment above. Thanks!

[x] please give your own paper a final read to check for any potential typos etc. -- I only updated the date to 24 June 2024 here
[x] make a new release of this software that includes the changes that have resulted from this review (https://github.com/xCDAT/xcdat/releases/tag/v0.7.1)
[x] make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive (link)
- [x] The title of the archive is the same as the JOSS paper title
- [x] That the authors of the archive are the same as the JOSS paper authors
[x] update this thread with the DOI of the archive: 10.5281/zenodo.12522560

arfon commented 4 months ago

@editorialbot set 10.5281/zenodo.12522560 as archive

editorialbot commented 4 months ago

Done! archive is now 10.5281/zenodo.12522560

arfon commented 4 months ago

@editorialbot set v0.7.1 as version

editorialbot commented 4 months ago

Done! version is now v0.7.1

arfon commented 4 months ago

@editorialbot recommend-accept

editorialbot commented 4 months ago

Attempting dry run of processing paper acceptance...

editorialbot commented 4 months ago

:warning: Error preparing paper acceptance.

arfon commented 4 months ago

@editorialbot set main as branch

editorialbot commented 4 months ago

Done! branch is now main

openjournals / joss-reviews