openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
700 stars 36 forks source link

[REVIEW]: MetSim: A Python package for estimation anddisaggregation of meteorological data #2042

Closed whedon closed 4 years ago

whedon commented 4 years ago

Submitting author: @arbennett (Andrew Bennett) Repository: https://github.com/UW-Hydro/MetSim Version: v2.2.1 Editor: @sjpfenninger Reviewer: @Chilipp, @dsryberg Archive: 10.5281/zenodo.3728015

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/c232fbc73fc2ad0c3a2e8a4c9e318236"><img src="https://joss.theoj.org/papers/c232fbc73fc2ad0c3a2e8a4c9e318236/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/c232fbc73fc2ad0c3a2e8a4c9e318236/status.svg)](https://joss.theoj.org/papers/c232fbc73fc2ad0c3a2e8a4c9e318236)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@Chilipp & @dsryberg, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @sjpfenninger know.

Please try and complete your review in the next two weeks

Review checklist for @Chilipp

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @dsryberg

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @Chilipp, @dsryberg it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 4 years ago
Reference check summary:

OK DOIs

- 10.1145/2833157.2833162 is OK
- 10.5334/jors.148 is OK
- 10.1016/0266-9838(86)90020-1 is OK
- 10.1016/S0168-1923(98)00126-9 is OK
- 10.1016/j.agrformet.2013.03.003 is OK
- 10.1175/JHM-D-18-0203.1 is OK
- 10.1029/94JD00483 is OK

MISSING DOIs

- https://doi.org/10.25080/majora-7b98e3ed-013 may be missing for title:  Dask: Parallel Computation with Blocked algorithms and Task Scheduling 

INVALID DOIs

- None
whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

Chilipp commented 4 years ago

Thanks @sjpfenninger! I am busy this week with other appointments, but I will do the review on monday in a week.

Chilipp commented 4 years ago

Hey everyone! My apologies for the delay! I am done with the review and I highly appreciate your work here! MetSim seems to be a very valuable tool for downscaling meterology and I hope that you will provide a methodologic paper (i.e. where you describe the scientific details of your method) at some point as well!

The code-base and the documentation of this package is in a good condition and I could install and run the software on my local laptop (Linux). I did not try it on OS X and Ubuntu, and the authors do not state anything about OS-specific requirements (which is something that could be added to the installation instructions). I came across https://github.com/UW-Hydro/MetSim/issues/200 and https://github.com/UW-Hydro/MetSim/pull/201 during the review, and besides that, I have the following comments with reference to the checklist above:


In the paper you state:

building upon previous tools by improving performance

In what sense does MetSim improve the performance compared to the previous tools?


The documentation of the metsim.methods.mtclim module is missing (see https://github.com/UW-Hydro/MetSim/issues/202). The same is true for the metsim.physics.solar_geom function.


There are tests but it is not described in the documentation, how to run them. This should be mentioned somewhere (e.g. the README file, in the installation instructions or in the Contribution guidelines). I also strongly recommend to measure the coverage of the tests (e.g. using pytest-cov and some free service, such as codecov.io).



We have based MetSim on methods from the Mountain Microclimate Simulator (MTCLIM) and the forcing preprocessor that was built into the Variable Infiltration Capacity (VIC) hydrological model version 4. MetSim provides a modern workflow, building upon previous tools by improving performance, adding new IO routines, allowing for exact restarts, and providing an extensible architecture which can incorporate new features. We have designed MetSim to fit into the broader scientific Python ecosystem, building on popular packages such as xarray, dask, pandas, and numba.

Is this enough for a review here? This paragraph does not describe the state of the field, but at least it highlights what is special about MetSim compared to other tools (modern workflow, improved performance, new IO routines, etc., although the statement about improving perfomance has to be revised, see Performance above).


arbennett commented 4 years ago

Hi @Chilipp - thank you for your review! I have addressed your comments above on the master branch. Thanks for catching that timestamp bug!

Regarding tests: we do show how to run the tests when installing from source. I'm not 100% sure if that's enough, but because our tests rely on sample datasets, it seems like the easiest way to recommend running them. Let me know if you have other ideas.

Regarding performance/state of the field: I added a bit to the performance claim, but the main takeaway is that we have implemented scalable parallelism, which is why we included a section on it. For the section on the state of the field, we hope it is clear that MetSim really does not aim to implement any new methods, but simply improve the infrastructure of some of the common methods that already existed in ad-hoc frameworks.

Chilipp commented 4 years ago

Hey @arbennett! Thanks for your quick actions!

Regarding tests: we do show how to run the tests when installing from source

Can you point me to the URL? I could not find it, neither in the installation instructions in the docs nor in the README

I added a bit to the performance claim, but the main takeaway is that we have implemented scalable parallelism

Thanks for the clarification!

Otherwise, only the part about State of the field is unclear for me (see my comment above @sjpfenninger)

arbennett commented 4 years ago

@Chilipp I updated the readme so that the line containing pytest --verbose is a little bit more clearly highlighted.

Chilipp commented 4 years ago

Ha, thanks @arbennett. Sorry, I must have overlooked that line :sweat_smile:

sjpfenninger commented 4 years ago

@Chillipp thanks for the review and @arbennett thanks for addressing them so rapidly! @arbennett can you make sure to address the point raised about covering State of the field -- i.e., referring to other published packages that MetSim is comparable, complements, or goes beyond? From your response above it sounds like this kind of code is usually set up on an ad-hoc basis and there are no published packages - if so, I would also point that out in the paper.

sjpfenninger commented 4 years ago

@dsryberg looking forward to your review as well!

arbennett commented 4 years ago

Thanks @sjpfenninger - I have updated the text to include a paragraph discussing the state of the field!

Chilipp commented 4 years ago

Thanks @arbennett! For me, this paper is ready for publication @sjpfenninger

sevberg commented 4 years ago

Hi all! Sorry for the delay/confusion on my part. I have now also completed the review, but I am unable to adjust the checklist since I cannot complete this step:

Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

I am told that the invitation has expired. Is this easy to fix?

arfon commented 4 years ago

I am told that the invitation has expired. Is this easy to fix?

I've just re-invited you. The link at https://github.com/openjournals/joss-reviews/invitations should work again now.

sjpfenninger commented 4 years ago

Thanks @arfon for the quick reaction and no worries @dsryberg, looking forward to your review!

sevberg commented 4 years ago

I am told that the invitation has expired. Is this easy to fix?

I've just re-invited you. The link at https://github.com/openjournals/joss-reviews/invitations should work again now.

Unfortunately I am still told "Sorry, we couldn't find that repository invitation. It is possible that the invitation was revoked or that you are not logged into the invited account." Although I have also made sure I am logging into the account by trying this on three different devices

arfon commented 4 years ago

Can you try again right now @dsryberg?

Screen Shot 2020-03-06 at 8 07 18 AM
sevberg commented 4 years ago

Everything seems to be working now. I will input my comments now...

sevberg commented 4 years ago

Overall: I agree that MetSim is a well structured, well documented, and sufficiently tested package, and I would definitely support it being published via JOSS. I nevertheless had a few hiccups during the review/testing process, and I also have trouble understanding who are the intended users of MetSim and what issue it is meant to resolve. I also suggest reworking some of the example scripts, as aspects of these failed for me. Finally, I think more detail should be given regarding MetSim's "Disaggregation of daily simulation values to sub-daily timesteps". I'll go into more detail below:

Functionality

  • [x] Installation: Does installation proceed as outlined in the documentation?

  • Installation was no issue for me on both a Linux and Windows PC. Although, as a minor issue, you may want to complete the same list of packages seen in the README as in the environment.yml file.

  • [x] Functionality: Have the functional claims of the software been confirmed?

    1. I successfully ran the built-in tests, and all 9 passed. This is great, but, executing these tests took 11 minutes, which is in my opinion long considering that they should be run often during future development processes. Especially the tests which run the examples "example_nc.conf" and "constant_vars_nc.conf" took the most time, so perhaps these could be simplified? In any case, 6 of the 9 tests are simply running the examples and they only ensure that an output file has been created. This will catch any raised errors, but it will miss the more subtle "sneaky" errors. Therefore I would suggest improving these by also checking some of the values which are produced to ensure that the code is working as intended.
  • Additionally, when I ran the tests I also evaluated the code coverage, which came out to a total of 75%, which is quite good! Nevertheless, the code within the file "physics.py" had the lowest specific score of 34%. Since the functions in this file are integral to the operation of MetSim, perhaps additional tests could be added to increase the coverage here.

  1. I then tried to follow the tutorials found at MetSim-tutorial
    1. Example 1 worked as expected
    2. Example 2 appeared to work but...
      • Interactive plots were not functional (likely an issue unrelated to MetSim)
      • Tried changing timestep to 15 minutes
        • MetSim appeared to work as intended
        • There seems to be a bug in the way timestamps are saved. This leads to an error when plotting in the example. It is unclear how to resolve this issue
    3. Example 3 failed due to an issue with pandas. Maybe a result of conda installing the newest pandas version (1.0)
  • [x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)
  • From my perspective, MetSim does not seem to consume an undue amount of resources and parallelizes well. This is also discussed sufficiently in the paper.

Documentation

  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?

  • (Good) Statements are given in the README which relate to what MetSim is capable of

  • (Missing) The existing problem which the software solves is not specifically stated in the README

  • (Missing) The intended audience is also not specified either

  • (Concern) I am not a part of the climate modelling community (so I may be missing context which is otherwise obvious), however I do not understand the need to a climate data disagregator which operates on daily data when sub-daily data is already readily available from many sources (see MERRA2, ERA5, ect...). It is not clear to me who is meant to benefit from this software.

  • [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).

  • The examples are included and furthermore seem quite detailed but they are in my opinion cryptic. Instead of beginning the examples with "# This is an example of an input file for MetSim", I would suggest an introductory description of what the example is meant to accomplish.

  • Also, please see the above comment regarding providing more examples/tests which cover the functions in "physics.py"

  • [x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?

  • The documentation is very well done. Although I've noticed differences between the documentation available on readthedocs versus what I can build from the cloned repository. For example, when I build the docs from the repository (using simply "make html") I do not see any items under the "API Reference" section. Although I'm not sure if this issue was caused by the doc files in MetSim, or else by Sphinx.

  • [x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

  • Yes, but see the notes above concerning Functionality

Software paper

  • [ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • I agree with the comment from @Chilipp that the state of the field is not well addressed. I also agree with @sjpfenninger, that, if there are not other modules to compare, against then this can be pointed out instead.

Other thoughts

  1. It should be stated why using MetSim to disaggregate daily weather variables is preferable to simply downloading and using RCM or GCM products which are already hourly (for example, MERRA, ERA5, SARAH, ect...)
  2. When listing the disaggregation capabilities of MetSim in the README, a brief note should be given regarding the quality or source of the algorithms used. For example, there is a sophisticated algorithm behind the solar irradiance disaggregation, but the wind speed disaggregation is just a forward fill for each time step in the day (at least, as far as I can tell by snooping through the code). An unaware user might not understand this disparity.
  3. I also have not found any information regarding spatial resolution. Are the models implemented in MetSim intended to be applicable on all spatial scales?
arbennett commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

arbennett commented 4 years ago

@dsryberg thanks for the thorough review! I have updated the text of the paper as well as the documentation here to reflect the algorithms used to disaggregate. The more recent version of the article proof contains additional text about the state of the field, particularly with respect to MTCLIM and VIC.

Regarding your point number 1, I want to be clear that we aren't trying to say that MetSim is preferable to something like ERA5, but is complementary. MetSim can be used to generate sub-daily timeseries when something like ERA5 cannot be used (pre-1950, smaller spatial scales, etc). This actually brings up your point 3 as well. MetSim can be used at any spatial scale or mesh arrangement. I have also added a sentence to the paper to address this.

I have also updated the tutorial setup so that the Pandas error no longer occurs.

arbennett commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

sjpfenninger commented 4 years ago

@arbennet: Thanks for your quick work in addressing reviewer comments. Based on the reviews, I think your paper still needs a tiny bit of work.

Basically, the first paragraph in your paper should explain to a general scientifically literate audience whom your package is for and what problem(s) it solves. Right now, you delve into jargon almost immediately.

Note the requirements stated on https://joss.readthedocs.io/en/latest/submitting.html:

A summary describing the high-level functionality and purpose of the software for a diverse, non-specialist audience. A clear Statement of Need that illustrates the research purpose of the software.

sjpfenninger commented 4 years ago

@dsryberg thanks for the review! If you can briefly check whether you are happy with the changes made in response, that would be great!

arbennett commented 4 years ago

@sjpfenninger - I have updated the introduction to be more friendly to a general audience.

sevberg commented 4 years ago

@arbennett - The updated descriptions of the disaggregation methods looks a lot better now. Also the paper introduction is more readable as well. I am definitely satisfied with these two updates. However, have you taken into consideration the other topics I previously listed?

Namely:

  1. Test run time
  2. physics.py testing coverage
  3. Making the examples more digestible
  4. The potential bug with timesteps shorter than 1 hour
arbennett commented 4 years ago

@sjpfenninger - I just released a new version, 2.2.1, to fix the timestamp bug as well as another bug found by a user when simulating polar regions. Is is possible to update the version for this paper?

@dsryberg - Thanks for the followup. I have released a new version to deal with the bug you describe in 4, along with an additional bug.

On the topic of tests, we don't have much bandwidth to improve the tests at this time. Specifically, for the time it takes to run the tests, I agree it is longer than necessary, but only a minor detriment in our view. Improving the physics.py test coverage would be great as well, but we test this "by proxy" by ensuring that the output is close to a benchmark from the VIC4.2 preprocessor. While our testing isn't perfect, it is still orders of magnitude better than the previous test-free situation of the ad-hoc arrangements described in the paper. We have an open issue to this end (found here), and I have commented there to improve the coverage.

For the examples, this was the reason we came up with the tutorial in a separate repo with a Binder setup for interactive use, as well as a filmed walkthrough (which I finally just updated the tutorial repo to link to). It provides background as well as some further exposition about how/why things are set up.

sevberg commented 4 years ago

@arbennett Thanks for the detailed response. In my opinion, the bug I came across was the most crucial issue to fix, so thanks for dealing with that. I've also just checked out the video walkthrough, and it is definitely very informative and well made. Maybe eventually this information should be put into a text form (so that one doesn't need to search through a 1.2 hr video to find the information they need) but, for the time being, I think it is sufficient. For the other issues, I think opening an issue for future improvements to the testing and examples is a good solution for now. After all, continued development is within the spirit of opensource software :)

With that being said, @sjpfenninger, I am satisfied with the state of this software and I support it being published within JOSS. Please let me know if there is anything more you need from me?

arfon commented 4 years ago

Dear authors and reviewers

We wanted to notify you that in light of the current COVID-19 pandemic, JOSS has decided to suspend submission of new manuscripts and to handle existing manuscripts (such as this one) on a "best efforts basis". We understand that you may need to attend to more pressing issues than completing a review or updating a repository in response to a review. If this is the case, a quick note indicating that you need to put a "pause" on your involvement with a review would be appreciated but is not required.

Thanks in advance for your understanding.

Arfon Smith, Editor in Chief, on behalf of the JOSS editorial team.

arbennett commented 4 years ago

@dsryberg and @Chilipp thanks to both of you for the really constructive reviews! I definitely feel like this whole process has made MetSim more accessible and uncovered a couple of bugs that are now fixed.

@sjpfenninger - let me know if there's anything else to do whenever you get around to looking at this again.

@arfon (and really the rest of you) - I just wanted to say how nice this review format is, I really appreciate the workflows and how clearly laid out everything is. This is my first submission to JOSS and the experience has been really pleasant!

arfon commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

arfon commented 4 years ago

@arbennett - I found an issue with your figure labels (Pandoc numbers these automatically). https://github.com/UW-Hydro/MetSim/pull/214 fixes this.

I'm not sure if @sjpfenninger is online currently but I'd like to give him a little time to wrap make his final checks on this before handing it off to the EiCs for publishing.

sjpfenninger commented 4 years ago

@arbennett Apologies for the delay and thanks for your work addressing the reviewer comments. It looks like this is ready for acceptance. Just one quick question - you mentioned updating the version, but I'm not sure what you mean, can you clarify?

sjpfenninger commented 4 years ago

@Chilipp @dsryberg Thanks both for your work reviewing this submission!

arbennett commented 4 years ago

@sjpfenninger Up on the header of this review it has: Version: v2.2.0

I'm wondering if it's possible/necessary to change this to v2.2.1 since we've fixed some bugs since submission. It seems it's not really referenced in the paper or my submission on the JOSS website so maybe this isn't needed.

sjpfenninger commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

sjpfenninger commented 4 years ago

@whedon set version as v2.2.1

whedon commented 4 years ago

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@whedon commands
sjpfenninger commented 4 years ago

@whedon set v2.2.1 as version

whedon commented 4 years ago

OK. v2.2.1 is the version.

sjpfenninger commented 4 years ago

@whedon check references

whedon commented 4 years ago
Reference check summary:

OK DOIs

- 10.1145/2833157.2833162 is OK
- 10.5334/jors.148 is OK
- 10.1016/0266-9838(86)90020-1 is OK
- 10.1016/S0168-1923(98)00126-9 is OK
- 10.1016/j.agrformet.2013.03.003 is OK
- 10.1175/JHM-D-18-0203.1 is OK
- 10.1029/94JD00483 is OK

MISSING DOIs

- https://doi.org/10.25080/majora-7b98e3ed-013 may be missing for title:  Dask: Parallel Computation with Blocked algorithms and Task Scheduling 
- https://doi.org/10.25080/majora-92bf1922-00a may be missing for title:  Data Structures for Statistical Computing in Python 

INVALID DOIs

- None
sjpfenninger commented 4 years ago

@arbennett Almost there, thanks for your patience. I have updated the version to v2.2.1. I've also noticed a couple small issues with the bibliography:

Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive. For the Zenodo/figshare archive, please make sure that:

Once that's done, we can move forward with accepting this!

arbennett commented 4 years ago

@sjpfenninger - thanks again for your efforts in editing this paper. I've updated the DOIs for the dask and pandas citations. I've also uploaded to Zenodo. The DOI is 10.5281/zenodo.3728015

https://doi.org/10.5281/zenodo.3728015

sjpfenninger commented 4 years ago

@whedon set 10.5281/zenodo.3728015 as archive