openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
708 stars 37 forks source link

[PRE REVIEW]: pyveg: A Python package for analysing the time evolution of patterned vegetation using Google Earth Engine #2364

Closed whedon closed 4 years ago

whedon commented 4 years ago

Submitting author: @samvanstroud (Samuel Van Stroud) Repository: https://github.com/alan-turing-institute/monitoring-ecosystem-resilience Version: v1.0.0 Editor: @usethedata Reviewers: @arbennett Managing EiC: Arfon Smith

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Author instructions

Thanks for submitting your paper to JOSS @samvanstroud. Currently, there isn't an JOSS editor assigned to your paper.

@samvanstroud if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:

@whedon commands
whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 4 years ago
Reference check summary:

OK DOIs

- 10.1098/rsos.160443 is OK
- 10.1111/gcb.14059 is OK
- 10.1073/pnas.0802430105 is OK
- 10.1103/PhysRevE.71.056103 is OK

MISSING DOIs

- None

INVALID DOIs

- None
whedon commented 4 years ago
Software report (experimental):

github.com/AlDanial/cloc v 1.84  T=26.45 s (3.1 files/s, 54004.7 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
JSON                              8              1              0        1411460
Python                           45           1908           2667           4982
Jupyter Notebook                 10              0           5114            960
Markdown                          7            148              0            448
R                                 5             55             90            248
JavaScript                        2             38             46            108
TeX                               1              7              0             68
YAML                              1              3              0             32
make                              1              0              0              2
Bourne Again Shell                1              1              0              2
--------------------------------------------------------------------------------
SUM:                             81           2161           7917        1418310
--------------------------------------------------------------------------------

Statistical information for the repository '2364' was gathered on 2020/06/18.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Camila Rangel Smith              3            49             51            0.19
Nick Barlow                     61          5208           3697           17.07
Sam Van Stroud                 307          6523           4283           20.71
crangelsmith                   135          4603           1733           12.14
jabrams23                        1             3              2            0.01
nbarlowATI                     147         13607           9870           45.00
samvanstroud                    50          1616            930            4.88

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Sam Van Stroud             3019           46.3          2.0               17.92
crangelsmith                874           19.0          2.2               10.18
jabrams23                     1           33.3          1.9                0.00
nbarlowATI                 5319           39.1          1.8               11.69
samvanstroud                535           33.1          3.7               14.77
whedon commented 4 years ago

Failed to discover a valid open source license.

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

danielskatz commented 4 years ago

👋 @samvanstroud - you will need to add an OSI-approved license before we can proceed with your submission. Sorry for the delay in saying this...

danielskatz commented 4 years ago

Additionally, thanks for your submission to JOSS. As described in our blog post announcing the reopening of JOSS, we're currently working in a "reduced service mode", limiting the number of papers assigned to any individual editor.

Since reopening JOSS, we've had a very large number of papers submitted and as such, yours has been put in our backlog that we have been and will continue to work through over the coming weeks and months.

Thanks in advance for your patience!

danielskatz commented 4 years ago

Please let us know when there is a license, then we can remove the paused label.

samvanstroud commented 4 years ago

Hi @danielskatz, thanks for writing. A license exists, though it is in the pyveg/ directory, as really the submission is only for the code in this dir (https://github.com/alan-turing-institute/monitoring-ecosystem-resilience/blob/master/pyveg/LICENSE.md). (We are developing in parallel R code which is in the rveg/ dir, however this is not in a state of readiness to be submitted to JOSS at this time).

Perhaps this is why the bot fails to detect a license? Please advise if I should move the license to the top level of the repo. Another option if needed is to make a new repo, and strip out everything that is not strictly necessary for the core pyveg package we are submitting for review. Just let me know if this option would make more sense. Thank you!

danielskatz commented 4 years ago

Perhaps this is why the bot fails to detect a license? Please advise if I should move the license to the top level of the repo.

Yes. Please do move it unless there's some reason not to.

Another option if needed is to make a new repo, and strip out everything that is not strictly necessary for the core pyveg package we are submitting for review. Just let me know if this option would make more sense.

As long as the material to be reviewed is clear, a repo with additional content is ok, though note that at the end of the process, we will archive the repo along with publishing the paper, so I think this is up to you.

samvanstroud commented 4 years ago

Thanks for the advice @danielskatz, the license has been moved. For now I think we'll otherwise leave the repo as is, and will also tag package authors @crangelsmith @nbarlowATI in case they want to follow this thread or add any info.

danielskatz commented 4 years ago

👋 @usethedata - would you be willing to edit this submission?

usethedata commented 4 years ago

@whedon assign @usethedata as editor

whedon commented 4 years ago

OK, the editor is @usethedata

usethedata commented 4 years ago

@samvanstroud A couple of thoughts as I walk through this in preparation for finding reviewers. I think these are things to address prior to review, but this is a discussion about subjects where there are multiple points of view, so please do provide your perspective where you see things differently.

  1. There are multiple sources of NDVI in Google Earth Engine. I believe the paper itself should be clearer about which NDVI is being used. It appears you've preconfigured for Sentinal-2, and Landsat[4578]. My opinion (colored by the fact that I work on the NASA Earth Observing System) is that the primary sources for that NDVI data should also be cited, in addition to citing GEE and the services it provides. This is an interesting challenge in and around GEE, in that they make a copy of the data (with some lag relative to updates at the primary data source) and provide some very useful services on top of that data, but the primary source(s) for that data are rarely credited.

  2. Likewise, I suggest citing the primary source for the ERA5 weather collection.

  3. I'm curious why you chose to store the coordinate locations in a python file rather than in a separate data file (e.g. JSON), and then refer to the locations by a number, rather than a location name. This seems something likely to cause confusion and inadvertent error.

  4. The significance/rationale for the specific 33 locations pre-configured in the package is unclear to me, based on reading the paper and a 20-30 minute examination of the source code. A minor nit is that location 13, listed as being in the USA is actually in Mexico. I suggest that a sentence or two about the preconfigured locations and any factors involved in extending this list is appropriate for the paper and/or the documentation.

usethedata commented 4 years ago

Disclaimer: I'm the principal investigator for the additional information I'm commenting on. I'm posting it here publicly to make it very clear that this is simply information provided colleague to colleague. It is something that I think is relevant to the science domain in which the authors are working and their potential future work. It is not directly relevant to this particular paper or software.

One of the very valuable things that GEE brings to the ecologist's toolbox is very simple subsetting. For NDVI from the MODIS/VIIRS platforms, my group provides REST web services that provide subsets in easy-to-use formats. See https://modis.ornl.gov/data/modis_webservice.html for additional information. Those services also provide weather information (for North America, only, unfortunately) from the Daymet data product.

samvanstroud commented 4 years ago

Hello @usethedata. Firstly thank you for editing this submission and for taking a first look. Here are some replies to your comments:

1 & 2. Thanks for pointing this out, we agree entirely the primary data source should be acknowledged. I've just updated the paper with acknowledgements for Landsat, Sentinel-2, and ERA5. Let me know if something is still missing.

  1. This is a good point, and your intuition is correct in that there is not really a good reason why we store this information in a Python file. However, I suspect we will keep some sort of numerical ID, as this is easier than defining a naming convention (we have multiple locations per country, for example, so country is not unique enough. Of course we could use the lat, long coordinates themselves, but this then effectively becomes as intelligible as a numerical ID for me at least). Perhaps a good middle ground would be to follow the following convention: "CountryName_01"?

  2. The coordinates file is currently justified in the paper with the sentence: "An evolving list of PVP locations, obtained through both literature and manual searches, is included in the package at pyveg/coordinates.py.". This is really just meant to be an exhaustive list of all the interesting patterned vegetation locations we have managed to find, the coordinates of which may be of use to anyone studying patterned vegetation ecosystems, and also to interface with the code. If that's not clear let me know, and I can add another sentence or so to the paper to clear things up. Good spot for location 13! I'll change that now.

Thanks again for your great comments.

usethedata commented 4 years ago

Thanks @samvanstroud. Just a quick post that I see your response. I've got a couple of deadlines pressing for the day job and I will get on looking for reviewers in the next couple of days.

usethedata commented 4 years ago

I'm working on getting reviewers. Requests sent out via email.

On the issue with the location identifiers, my particular background makes me aware of the kinds of issues that come up when people have to input information (like typing in a location number) and single character errors result in valid, but incorrect results. By this, I mean that (for example) 08 and 09 are both valid, but different, entries. I've generally found it to be a best practice that any time I need a user to type something in off a list or put something into a configuration file that I set things up so that single character errors result in an invalid entry. I did a six sigma study of this several years ago, and the error rate from technicians typing in a 3 digit number that they were reading off of a container was astounding.

So, your mileage may vary, and you've already done a lot of work on this. But, something you might wish to consider for the future. I realize that's a challenge for this particular challenge -- it's not like you've got something like LTER site names to work from, and there's not an obvious scheme. So, no action needed -- just something you might consider for future work.

nbarlowATI commented 4 years ago

Hi @usethedata thanks, yes, this is an excellent point, and I have fallen victim to this myself (changing the number in one place but not the corresponding output location...). For our production run for the paper we're working on, we plan to mitigate this by scripting the workflow, from generation of the config files that define inputs and outputs, to the upload of the results to an open source repository (possibly Zenodo), such that this coordinates.py file (which may indeed become a csv or json as you suggest) is the definitive and unique link from ID to location.
We believe that this way we can ensure at least that our inputs and outputs of the code are consistent, and any risks from transcription errors are pushed further down the line into paper drafting, where there will be more sets of eyes to check.

Thanks again for looking at this and the very useful feedback!

arbennett commented 4 years ago

Hi @usethedata - I would be happy to review this!

usethedata commented 4 years ago

@whedon assign @arbennett as reviewer

whedon commented 4 years ago

OK, @arbennett is now a reviewer

usethedata commented 4 years ago

@arbennett @samvanstroud -- I'm still working on getting a second reviewer. In the interests of letting @arbennett move forward with his review, I'm going to go ahead and open the review issue. I'll add the second reviewer when I get that volunteer.

usethedata commented 4 years ago

@whedon start review

whedon commented 4 years ago

OK, I've started the review over in https://github.com/openjournals/joss-reviews/issues/2483.