openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
701 stars 36 forks source link

[REVIEW]: HyRiver: Hydroclimate Data Retriever #3175

Closed whedon closed 2 years ago

whedon commented 3 years ago

Submitting author: @cheginit (Taher Chegini) Repository: https://github.com/cheginit/HyRiver Version: v0.11 Editor: @kthyng Reviewers: @raoulcollenteur, @arbennett Archive: 10.5281/zenodo.5602114

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/b0df2f6192f0a18b9e622a3edff52e77"><img src="https://joss.theoj.org/papers/b0df2f6192f0a18b9e622a3edff52e77/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/b0df2f6192f0a18b9e622a3edff52e77/status.svg)](https://joss.theoj.org/papers/b0df2f6192f0a18b9e622a3edff52e77)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@arbennett & @raoulcollenteur, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @kthyng know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Review checklist for @arbennett

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @raoulcollenteur

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @evanleeturner, @raoulcollenteur it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.3334/ORNLDAAC/1840 is OK
- 10.3133/fs20193051 is OK
- 10.3133/fs20203033 is OK

MISSING DOIs

- None

INVALID DOIs

- None
whedon commented 3 years ago
Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.08 s (307.3 files/s, 22541.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
reStructuredText                12            232            212            564
YAML                             6             21             16            199
SVG                              1              0              0            162
Python                           2             35             55            122
Markdown                         1             10              0            113
make                             2             18              9             76
DOS Batch                        1              8              1             26
TeX                              1              2              0             26
-------------------------------------------------------------------------------
SUM:                            26            326            293           1288
-------------------------------------------------------------------------------

Statistical information for the repository 'f3d8ba1c0ba6cb9fae2ee0b7' was
gathered on 2021/04/15.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
cheginit                         8         18139          17927          100.00

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
cheginit                    212            1.2          0.0               22.64
whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 3 years ago

:wave: @evanleeturner, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @raoulcollenteur, please update us on how your review is going (this is an automated reminder).

kthyng commented 3 years ago

just a friendly review reminder here, @evanleeturner and @raoulcollenteur

raoulcollenteur commented 3 years ago

Started on the Review today, interesting work! I will edit this comment over the next couple of days.

Summary HyRiver is a collection (software stack) of six python packages that aim to improve the workflow of data collection and processing for hydrological and climatological data from public web services. This is an important and often time-consuming step in hydrological studies, and I congratulate the Authors on their work. Although several alternatives are available (also listed in the paper), this stack seems to provide a unified access to many data sources in the US. The stack is subdivided over several (8?) GitHub repositories, one of which combines all the packages (https://github.com/cheginit/HyRiver). The package is currently actively maintained and looking at some of the user statistics there is a clear demand for this package. The software stack is a significant scholarly contribution in my opinion and suitable for publication in JOSS.

Package Organization My only real concern is the dispersion of the information and code. The packages, information, and support structures are subdivided over several GitHub repositories. I think it would be good to (at least) put all of these under a single GitHub organization and try to reduce the dispersion of information a bit. Documentation is already provided from a single ReadTheDocs website, which is a nice entry point for new users. It could help to do the same for the user support structures (e.g., Code Issues, User questions), which now seem spread over all repositories. Maybe the Authors prefer it the way it is, but I can imagine it is more work to maintain and keep track of user discussions.

Would it not be a better idea to combine all the packages into a single Python packages and have the current packages as sub-packages of hyriver? Given the limited amount of code contained in the individual packages I think this would not be a bad option and it would probably reduce the maintenance cost in the long term. Is there a specific reason different packages were created instead of a single one? A related question I have is if it is possible to use the packages independently. If so, this would be worth mentioning in the docs installation guide. If any case, it could be considered to put all packages into a single python package and use sub-packages to structure.

Here's a list of minor suggestions to improve the package, JOSS paper and documentation (may I add more later):

I am still trying to run the examples from https://github.com/cheginit/HyRiver-examples but will need some more time for that. I get lots of errors when trying to import packages, will post some here later.

Great work!

Cheers, Raoul

cheginit commented 3 years ago

@raoulcollenteur Thanks for accepting to review this submission and your comments.

My only real concern is the dispersion of the information and code. The packages, information, and support structures are subdivided over several GitHub repositories. I think it would be good to (at least) put all of these under a single GitHub organization and try to reduce the dispersion of information a bit. Documentation is already provided from a single ReadTheDocs website, which is a nice entry point for new users. It could help to do the same for the user support structures (e.g., Code Issues, User questions), which now seem spread over all repositories. Maybe the Authors prefer it the way it is, but I can imagine it is more work to maintain and keep track of user discussions.

Would it not be a better idea to combine all the packages into a single Python packages and have the current packages as sub-packages of hyriver? Given the limited amount of code contained in the individual packages I think this would not be a bad option and it would probably reduce the maintenance cost in the long term. Is there a specific reason different packages were created instead of a single one?

The software stack was initially a single package called hydrodata. As the package started to grow and I added support for more service and new functionalities, the maintenance became more difficult. So I decided to break it down into six self-contained packages, each of which has a limited scope and serves different target users. For example, some users might only want to have access to river network data, so instead of installing a hefty package with many dependencies and irrelevant functions, they can just install PyNHD that offers what they need. Also, from development point of view (for the core developers as well as contributors), working on new functionalities and tracking down issues become easier as the code base is smaller for each package and better organized. The only downside is more work for releasing six packages instead of one which is not big of an issue overall.

Moreover, the packages are divided into two categories: Low- and high-level APIs. PyGeoOGC and PyGeoUtils (and the recently added package called async_retriever) are the engines, if you will, that the other packages use to connect to specific web services. The other packages are specific to the public web services that are provided by US agencies.

A related question I have is if it is possible to use the packages independently. If so, this would be worth mentioning in the docs installation guide. If any case, it could be considered to put all packages into a single python package and use sub-packages to structure.

Yes, the packages are standalone. The README of each one of the repositories contains descriptions, instructions, and examples that are specific to that package. I will modify the paper as well as repositories' README to reflect this explicitly.

I think most of the packages are only applicable in certain geographic areas (US?), this should be made explicit in the Docs and JOSS paper.

Yes, currently only US is supported but contributions are welcome as the low-level packages are generic and can be used to access any of the supported web service protocols. I will make this more explicit in the paper and repositories.

Consider adding a page with the features of each package to the documentation website (hyriver.readthedocs.io)

I thought about this when I was designing the website. Since each package has a repository with a README that has all the relevant information, instead of repeating the same thing in the website, I decided to include a one-line description of each package and provide the link to each repository. Additionally, the example gallery includes most of the things that are in READMEs and even more, so in my opinion including them again in separate pages will be repetitive.

The Contributing page seems outdates and can be updated: hyriver.readthedocs.io/en/latest/contributing.html

I will take care of it.

I could not easily find a location where user can "seek support", maybe using a central GitHub Discussion or Slack is an idea?

The idea is that users can open issues for packages in their corresponding repositories or use the main repository (HyRiver). Users have been using both. I just added a discussions tab to the HyRiver's repository.

I am still trying to run the examples from cheginit/HyRiver-examples but will need some more time for that. I get lots of errors when trying to import packages, will post some here later.

* [ ]  import pydaymet: AttributeError: 'LGEOS360' object has no attribute 'GEOSBufferWithParams'

* [ ]  import pygeohydro: AttributeError: module 'pyproj' has no attribute 'crs'

* [ ]  import py3dep: AttributeError: module 'pyproj' has no attribute 'crs'

It would be great and appreciated if you can provide more details and, perhaps, open an issue so I can reproduce and fix them.

By the way, I am working on a new release as users reported some issues after xarray released a new version recently that included some breaking changes. I have already took care of the issues in the main branches of the affected repositories and will release a new version soon.

kthyng commented 3 years ago

Hi @evanleeturner and @raoulcollenteur! Will you be able to continue your reviews soon?

cheginit commented 3 years ago

@kthyng Over the last two months (since the submission date), I have added enough new features and bug fixes to these libraries that prompt new releases. Would it be alright if I change the version from 0.10 to 0.11 in the first comment?

kthyng commented 3 years ago

@whedon set v0.11 as version

whedon commented 3 years ago

OK. v0.11 is the version.

kthyng commented 3 years ago

@cheginit I have updated your version, and we'll also update it at the end of the review process.

kthyng commented 3 years ago

@raoulcollenteur will you be able to work on your review soon? Just pinged @evanleeturner by email too.

cheginit commented 3 years ago

@kthyng Thanks, appreciate it.

cheginit commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kthyng commented 3 years ago

@raoulcollenteur will you be able to work on your review soon? @evanleeturner said by email can work on it soon.

kthyng commented 3 years ago

@raoulcollenteur and @evanleeturner — just a friendly ping here about your reviews.

cheginit commented 3 years ago

@kthyng @raoulcollenteur @evanleeturner Please let me know if there are any issues that I need to address for this submission. Also, since April I made one minor release (0.11) that includes breaking changes and a new patch release (0.11.x) is around the corner. So please make sure to check out the latest version.

kthyng commented 3 years ago

Thanks @cheginit. I just pinged by email to try to rise above the github notification noise.

raoulcollenteur commented 3 years ago

Sorry @kthyng , the email indeed helped!

Thanks @cheginit for all the changes to the code, documentation, and the JOSS paper. I have now checked most boxes in the Review form. Just one issue that remains I think, about running the examples after a clean installation.

* [ ]  import pydaymet: AttributeError: 'LGEOS360' object has no attribute 'GEOSBufferWithParams'

* [ ]  import pygeohydro: AttributeError: module 'pyproj' has no attribute 'crs'

* [ ]  import py3dep: AttributeError: module 'pyproj' has no attribute 'crs'

It would be great and appreciated if you can provide more details and, perhaps, open an issue so I can reproduce and fix them.

I checked, and these problems seem to occur because there are no minimum dependencies defined (e.g., numpy > 1.6) for installation. My environment had Pyproj 1.9.6, which causes errors. The same holds for geopandas (I had 0.6). Most examples won't work in this case, so it seems necessary to add minimum versions for all dependencies during installation throughout the project. I suggest to create a clean environment and run all the examples, and update the minimum versions where necessary. I now did this manually for my environment and could run most of the examples already.

Cheers, Raoul

cheginit commented 3 years ago

@raoulcollenteur Thanks for the feedback, it's a good idea. I added minimum versions as you suggested, hopefully it will resolve the issue.

Some of the example won't work since two of the web services that I am using are having technical issues that their developers are working on to fix. I have already open two issues (cheginit/pynhd#17 and cheginit/pygeohydro#51) to update the status of those web services.

kthyng commented 3 years ago

Thanks everyone. As an update I did also ping @evanleeturner and he is on vacation for a couple of weeks so that review will have to wait. Thanks for your patience.

cheginit commented 3 years ago

@raoulcollenteur I released new versions for all the packages. Also, the two problematic web services are now back up. So, all the example notebooks now should run without any issues. You can give them a try via binder.

kthyng commented 3 years ago

I just emailed @evanleeturner again to see about starting this review.

cheginit commented 3 years ago

@kthyng Thank you.

kthyng commented 3 years ago

Hi @cheginit — I have heard by email that @evanleeturner won't have time to do this review after all. I am sorry about the delay in this. I will now recruit a new reviewer.

Hi @arbennett! Would you be able to step in as a reviewer on this submission for JOSS? The review process is already underway, but one reviewer has had to step out, and I am looking for a replacement. I know you've reviewed for JOSS before, but as a reminder you can see more information here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html

cheginit commented 3 years ago

@kthyng That would be great, thank you.

arbennett commented 3 years ago

@kthyng I would be happy to review this! I see the review process has been going for a while so I'll try to complete my review ASAP. I have one other pending JOSS review at the moment so as soon as I work through that I'll start here.

kthyng commented 3 years ago

@arbennett Thank you so much for your willingness to help. I am sorry that I asked when you have another JOSS review going — I thought I had checked to be sure not to overload you. If you are still willing, I will add you as a reviewer and get you set up for when you are ready.

arbennett commented 3 years ago

Yep - I am still willing

kthyng commented 3 years ago

@whedon add @arbennett as reviewer

whedon commented 3 years ago

OK, @arbennett is now a reviewer

kthyng commented 3 years ago

@whedon remove @evanleeturner as reviewer

whedon commented 3 years ago

OK, @evanleeturner is no longer a reviewer

kthyng commented 3 years ago

@arbennett ok you should be good to go when you can do this.

raoulcollenteur commented 3 years ago

@kthyng is there anything I should still do for this review? @cheginit made all requested changes (nice work, I like the Binder option added!!), I checked everything in the list and it is all good to go now I think.

cheginit commented 3 years ago

@raoulcollenteur Thanks for taking the time to review my submission and your helpful comments.

kthyng commented 3 years ago

Thank you @raoulcollenteur for your review and letting me know you are finished. We don't need anything else from you!

cheginit commented 2 years ago

@arbennett Please let me know if you have any questions or difficulty testing the packages.

arbennett commented 2 years ago

@cheginit thank you! I just finished up my other review and will put it on my calendar to review this, this week. Looking forward to checking this out, it seems really relevant!

cheginit commented 2 years ago

@arbennett Thanks, appreciate it! Looking forward to seeing your comments.

kthyng commented 2 years ago

Hi @arbennett and thank you so much for agreeing to take over this review (especially when you already had one to do!). When will you be able to start your review?

kthyng commented 2 years ago

Update! @arbennett plans to finish their review by the end of the week. Thank you!

arbennett commented 2 years ago

Summary

HyRiver is the high level interface to a number of APIs for accessing, analyzing, and visualizing datasets related to hydrologic modeling. It provides a unified interface and is implemented in modern python in a way that is easily scriptable and extensible into larger workflows. HyRiver mainly addresses problems of either setting up large/complex hydrologic models or doing further analysis on primary datasets, and provides ways to access information about a number of important areas from watershed delineations, river networks, and DEMS to metorologic forcing data and land use/cover. I've started this review a few times now and each time get a little distracted while playing with the package itself. I think this is an awesome contribution and can easily see myself using it in my own research.

The paper itself is well written and explains what/where everything is. The only real comment I have is whether it would be worth having a top level installer that installs all of the packages? I see the argument against this too, so it's not really important. I think you've also done a good job explaining the ecosystem of other similar packages, and explained the rationale behind HyRiver. I've also got to commend you on the documentation - it's really nice. I love how many examples you've put together and even made them runnable in binder. I was pretty easily able to go from just <shift>-<Entering> my way through the binder to installing locally on my laptop with the same notebooks, and then even to modifying them for a one-off in a new region of interest.

I would say based on all of this, I am happy to recommend accepting this for publication in JOSS! Really nice work here!

cheginit commented 2 years ago

@arbennett Thanks for taking the time and reviewing our submission.

Regarding your comment, I actually started from having a single package called hydrodata that included all the functions then I decided to break it down to several standalone packages, each targeting a specific type of datasets. Moreover, if you only install pygeohydro and pydaymet all the other packages will be installed since they're dependencies of these two packages.

I am glad that you found HyRiver useful for your own research, hopefully it'll be a helpful tool for other researchers in the community as well.

cheginit commented 2 years ago

@kthyng Since @arbennett finished his review, is there anything else that I need to do?

kthyng commented 2 years ago

@cheginit Everything is looking really good! Thanks so much @arbennett for reviewing mid-way here and @raoulcollenteur for reviewing in the first place long ago. Let us proceed and get this published! Just a few more steps.