openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: mosartwmpy: A Python implementation of the MOSART-WM coupled hydrologic routing and water management model #3221

Closed whedon closed 3 years ago

whedon commented 3 years ago

Submitting author: @thurber (Travis Thurber) Repository: https://github.com/IMMM-SFA/mosartwmpy Version: v0.0.8 Editor: @kthyng Reviewer: @JannisHoch, @cheginit Archive: 10.5281/zenodo.4976463

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/3b18ac34dca39129a4fe454ad8ab02e1"><img src="https://joss.theoj.org/papers/3b18ac34dca39129a4fe454ad8ab02e1/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/3b18ac34dca39129a4fe454ad8ab02e1/status.svg)](https://joss.theoj.org/papers/3b18ac34dca39129a4fe454ad8ab02e1)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@JannisHoch & @cheginit, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @kthyng know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @JannisHoch

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @cheginit

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @JannisHoch, @cheginit it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.5194/hess-17-3605-2013 is OK
- 10.1175/JHM-D-12-015.1 is OK
- 10.11578/E3SM/dc.20180418.36 is OK
- 10.1016/j.cageo.2012.04.002 is OK
- 10.21105/joss.02317 is OK
- 10.1029/2020WR027902 is OK
- 10.5194/hess-23-2261-2019 is OK
- 10.1073/pnas.2020431118 is OK
- 10.5194/hess-24-1275-2020 is OK
- 10.1002/2016WR019767 is OK
- 10.5194/hess-19-3239-2015 is OK

MISSING DOIs

- None

INVALID DOIs

- None
whedon commented 3 years ago
Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.69 s (82.0 files/s, 7358.3 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          36            577            785           2688
YAML                             7             42            160            280
Markdown                         6             64              0            273
TeX                              1             12              0            133
DOS Batch                        1              8              1             26
reStructuredText                 2              6             10             10
make                             1              4              7              9
TOML                             1              0              0              6
Bourne Shell                     2              3              7              3
-------------------------------------------------------------------------------
SUM:                            57            716            970           3428
-------------------------------------------------------------------------------

Statistical information for the repository '53f4002b6a95f04d072abcd9' was
gathered on 2021/04/28.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
crvernon                         6           172              5            1.11
travis                          51          9810           5927           98.89

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
crvernon                    105           61.0          2.5               15.24
travis                     3945           40.2          2.2               15.59
whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

whedon commented 3 years ago

:wave: @cheginit, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @JannisHoch, please update us on how your review is going (this is an automated reminder).

crvernon commented 3 years ago

@kthyng @cheginit Are there any questions that we could answer to help get you started with your mosartwmpy? Please don't hesitate to reach out. Thanks!

kthyng commented 3 years ago

Just to clarify — I'm the editor for your submission @crvernon but am not reviewing your software.

crvernon commented 3 years ago

Apologies @kthyng !

cheginit commented 3 years ago

@crvernon Hi Chris, sorry for the delay. I will post my questions by the end of this week.

cheginit commented 3 years ago

Overall, I find this submission a useful contribution to the hydrology community and appreciate the effort of the developers. Here are my initial thoughts on the submission.

Paper

Overall, the paper reads well and states the purpose of the project clearly. Some of my concerns have already been addressed by the new commits. However, I am not quite sure if I understand the purpose of this sentence:

As a sample benchmark, a one-year simulation using MOSART-WM on a 24-CPU node of an institutional supercomputer can be performed in about half an hour; whereas the same simulation using mosartwmpy on a personal 6-CPU laptop can be performed in about five hours.

This is not a fair and informative comparison: Running a Fortran code on a cluster with 24 CPUs vs. running a Python code on a latop with 6 CPUs. There's no common ground between the two; Code languages are different, system specifications of the testing environments are different, and number of CPUs are different. I know setting up E3SM on a laptop is not an easy task, so I would suggest to at least run the Python code on the same cluster with 8 CPUs and run the Fortran version on the same cluster and with 8 CPUs. Overall, I am not sure if this comparison is really necessary since, as you mentioned in the paper, the main purpose of mosartwmpy is prototyping and educational.

Code

The authors opted to use numexpr as the engine, so to speak, to carry out the heavy lifting in their implementation of MOSART-WM. This choice makes the code difficult to read and hinders further development by the "the domain scientist". There are much better approaches that even scale much better (more than 8 CPUs) than numexpr. I would like to know why the developers decided to use numexpr over numba?

I understand that this might be a personal preference but I believe in the future versions, the authors might want to consider refactoring their code to use numba as the engine. It can improve the performance and scaling, as well as, readability. I developed a hydrological model (not public yet) that is based on numba and have tested it on a cluster with 28 cores and it scales nicely.

Regarding the documentation, the website provides examples but I think it can be improved by providing an example for working with restart files as well as runoff outputs from CLM or E3SM outputs since this is the primary use case.

Additionally, I opened a couple of issues/suggestions in the repository.

thurber commented 3 years ago

Thanks @cheginit, this is good feedback!

I just merged a PR that removes most of the Functionality and limitations section based on @JannisHoch's feedback, so the section with the awkward CPU comparison is gone.

Regarding the speed and the use of numexpr -- I agree it's semantically burdensome, but at least for my own laptop and cluster, the numexpr implementation was faster than a numba implementation for doing the same vectorized calculations. That said, I'm still not satisfied with the execution speed and am planning to do an overhaul in the medium-term future. I'd be very curious to learn from your experiences with numba. In your implementation, do you solve in a vectorized fashion or cell-by-cell? Despite the traditional advice to go vectorized in Python, I fear it may have led to the current speed challenges we are facing in mosartwmpy.

Good idea about providing examples with restart files and dynamic runoff inputs -- I'll open an issue for that and get to work on it.

cheginit commented 3 years ago

@thurber The recommended way of implementing numba is to avoid vectorization and unroll all the loops. Adding signatures to functions can also speed up the code. You need to keep type casting during the runtime at the minimum. Another point is about the decision between using f8 or f4 that also affects the performance and accuracy.

cheginit commented 3 years ago

@thurber @crvernon The submission is almost ready for publication. In my opinion, there are only two things left to be addressed:

thurber commented 3 years ago

thanks @cheginit, I believe I have addressed both these points in the latest PR, which adds a link to a Jupyter notebook with a tutorial that includes steps to use restart files and steps to drive with dynamic input: https://github.com/IMMM-SFA/mosartwmpy/blob/main/notebooks/tutorial.ipynb

cheginit commented 3 years ago

@kthyng All my concerns have been addressed and, in my opinion, the submission is ready for publication.

kthyng commented 3 years ago

@cheginit Thanks for your review! Rather than assuming, can I verify the meaning of crossing out the performance checkbox above? What does that mean in your review?

kthyng commented 3 years ago

@JannisHoch What is your expected timeline for working on your review again?

cheginit commented 3 years ago

@kthyng I meant that there was no claim to be checked. In the parenthesis, it says to check it off, I thought that's what it meant. If you prefer, I can just check the item.

JannisHoch commented 3 years ago

@JannisHoch What is your expected timeline for working on your review again?

@kthyng I hope to finish it tonight

kthyng commented 3 years ago

@kthyng I meant that there was no claim to be checked. In the parenthesis, it says to check it off, I thought that's what it meant. If you prefer, I can just check the item.

Ok that is what I figured, but yes please just check it off instead if you please. Thanks.

JannisHoch commented 3 years ago

@kthyng review is successfully wrapped up. Paper and software can be published now.

kthyng commented 3 years ago

@JannisHoch Can you check off the box about "Functionality" in your review checklist if it has been satisfied, or otherwise open an issue/post a comment about what you have a concern about?

JannisHoch commented 3 years ago

@JannisHoch Can you check off the box about "Functionality" in your review checklist if it has been satisfied, or otherwise open an issue/post a comment about what you have a concern about?

Done @kthyng

kthyng commented 3 years ago

Excellent! Thanks everyone! We'll move to the next steps now.

kthyng commented 3 years ago

@thurber what version would you like to associate with your JOSS submission?

kthyng commented 3 years ago

@thurber please make an archive of your software at a place like Zenodo, and make sure that the title and author list exactly match your JOSS paper.

kthyng commented 3 years ago

@whedon generate pdf

kthyng commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.5194/hess-17-3605-2013 is OK
- 10.1175/JHM-D-12-015.1 is OK
- 10.11578/E3SM/dc.20180418.36 is OK
- 10.5066/F76Q1VQV is OK
- 10.1016/j.cageo.2012.04.002 is OK
- 10.21105/joss.02317 is OK
- 10.1029/2020WR027902 is OK
- 10.5194/hess-23-2261-2019 is OK
- 10.1073/pnas.2020431118 is OK
- 10.5194/hess-24-1275-2020 is OK
- 10.5194/hess-19-3239-2015 is OK

MISSING DOIs

- None

INVALID DOIs

- https://doi.org/10.1029/2018MS001583 is INVALID because of 'https://doi.org/' prefix
whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kthyng commented 3 years ago

@thurber Please check out that invalid DOI listing above. Other paper comments:

thurber commented 3 years ago

Thanks @kthyng! I've fixed the the invalid DOI, added "GitHub Inc." as the author for GitHub, fixed the spacing in Yoon et al, and escaped the capitalization for Earth and a few other words.

I've also updated the zenodo metadata such that new releases should receive the exact title and author list of the paper, and released v0.0.8 with all these changes. The DOI is http://doi.org/10.5281/zenodo.4976463.

Let me know any other issues that come up!

kthyng commented 3 years ago

@whedon set v0.0.8 as version

whedon commented 3 years ago

OK. v0.0.8 is the version.

kthyng commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.5194/hess-17-3605-2013 is OK
- 10.1175/JHM-D-12-015.1 is OK
- 10.11578/E3SM/dc.20180418.36 is OK
- 10.5066/F76Q1VQV is OK
- 10.1016/j.cageo.2012.04.002 is OK
- 10.21105/joss.02317 is OK
- 10.1029/2020WR027902 is OK
- 10.5194/hess-23-2261-2019 is OK
- 10.1073/pnas.2020431118 is OK
- 10.5194/hess-24-1275-2020 is OK
- 10.5194/hess-19-3239-2015 is OK
- 10.1029/2018MS001583 is OK

MISSING DOIs

- 10.5194/gmd-10-3913-2017 may be a valid DOI for title: GLOFRIM v1.0–A globally applicable computational framework for integrated hydrological–hydrodynamic modelling

INVALID DOIs

- None
kthyng commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kthyng commented 3 years ago

@thurber Your changes look good! Does look like that is a valid missing DOI above — please add that to your Hoch reference and we should be good to go!

thurber commented 3 years ago

@kthyng done!

kthyng commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.5194/hess-17-3605-2013 is OK
- 10.1175/JHM-D-12-015.1 is OK
- 10.11578/E3SM/dc.20180418.36 is OK
- 10.5194/gmd-10-3913-2017 is OK
- 10.5066/F76Q1VQV is OK
- 10.1016/j.cageo.2012.04.002 is OK
- 10.21105/joss.02317 is OK
- 10.1029/2020WR027902 is OK
- 10.5194/hess-23-2261-2019 is OK
- 10.1073/pnas.2020431118 is OK
- 10.5194/hess-24-1275-2020 is OK
- 10.5194/hess-19-3239-2015 is OK
- 10.1029/2018MS001583 is OK

MISSING DOIs

- None

INVALID DOIs

- None
kthyng commented 3 years ago

@whedon recommend-accept

whedon commented 3 years ago

No archive DOI set. Exiting...

kthyng commented 3 years ago

@whedon set 10.5281/zenodo.4976463 as archive

whedon commented 3 years ago

OK. 10.5281/zenodo.4976463 is the archive.

kthyng commented 3 years ago

@whedon recommend-accept

whedon commented 3 years ago
Attempting dry run of processing paper acceptance...
whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.5194/hess-17-3605-2013 is OK
- 10.1175/JHM-D-12-015.1 is OK
- 10.11578/E3SM/dc.20180418.36 is OK
- 10.5194/gmd-10-3913-2017 is OK
- 10.5066/F76Q1VQV is OK
- 10.1016/j.cageo.2012.04.002 is OK
- 10.21105/joss.02317 is OK
- 10.1029/2020WR027902 is OK
- 10.5194/hess-23-2261-2019 is OK
- 10.1073/pnas.2020431118 is OK
- 10.5194/hess-24-1275-2020 is OK
- 10.5194/hess-19-3239-2015 is OK
- 10.1029/2018MS001583 is OK

MISSING DOIs

- None

INVALID DOIs

- None