ropensci / software-review

rOpenSci Software Peer Review.
295 stars 104 forks source link

chirps: API Client for CHIRPS #357

Closed kauedesousa closed 4 years ago

kauedesousa commented 4 years ago

Submitting Author: Kauê de Sousa (@kauedesousa)
Repository: https://github.com/agrobioinfoservices/chirps
Version submitted: 0.0.4
Editor: @melvidoni Reviewer 1: @cvitolo Reviewer 2: @jzwart Archive: TBD
Version accepted: TBD


Package: chirps
Type: Package
Title: API Client for CHIRPS
Version: 0.0.4
Authors@R: c(person("Kaue", "de Sousa", 
        email = "kaue.desousa@inn.no", role = c("aut", "cre"),
        comment = c(ORCID = "0000-0002-7571-7845")), 
        person("Adam H.", "Sparks", role = c("aut"),
        comment = c(ORCID = "0000-0002-0061-8359")),
        person("William", "Ashmall", role = "aut",
        comment = c("API Client implementation")),
        person("Jacob", "van Etten", role = c("ths"),
        comment = c(ORCID = "0000-0001-7554-2558")),
        person("Svein", "Solberg", role = c("ths"),
        comment = c(ORCID = "0000-0002-4491-4483")))
Maintainer: Kaue de Sousa <kaue.desousa@inn.no>
URL: https://agrobioinfoservices.github.io/chirps/
BugReports: https://github.com/agrobioinfoservices/chirps/issues
Description: API Client for the Climate Hazards Group InfraRed Precipitation
  with Station Data 'CHIRPS'. The 'CHIRPS' data is a 35+ year quasi-global
  rainfall data set, which incorporates 0.05 arc-degrees resolution satellite
  imagery, and in-situ station data to create gridded rainfall time series for
  trend analysis and seasonal drought monitoring. For more details on 'CHIRPS'
  data please visit its official home page
  <https://www.chc.ucsb.edu/data/chirps>. 
License: GPL-3
Encoding: UTF-8
LazyData: true
Depends: R (>= 2.10)
Imports:
    crul,
    jsonlite,
    methods,
    sf,
    stats,
    tibble
Suggests:
    knitr,
    rmarkdown,
    testthat (>= 2.1.0)
Language: en-GB
RoxygenNote: 7.0.2
VignetteBuilder: knitr

Scope

Technical checks

Confirm each of the following by checking the box. This package:

Publication options

JOSS Options - [x] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [x] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [x] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)
MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

melvidoni commented 4 years ago

Editor checks:


Editor comments

Thanks for the submission @kauedesousa. This is the result from goodpractice::gp(). Please, correct this and let me know, so I can start searching for reviewers.

── GP chirps ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

It is good practice to

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines
    shorter than 80 characters

    R\GET.R:93:1
    R\internal_functions.R:386:1

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on
    the right hand side is zero. Use seq_len() or seq_along() instead.

    R\GET.R:107:23
    R\get_chirps.R:360:28
    R\get_chirps.R:380:29
    R\get_esi.R:364:28
    R\get_esi.R:391:29
    ... and 4 more lines

  ✖ not import packages as a whole, as this can cause name clashes between the imported packages. Instead, import only the specific functions you
    need.
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 

Reviewers: @cvitolo and @jzwart Due date: February 10th

kauedesousa commented 4 years ago

Dear @melvidoni thanks for your feedback. I've made the changes and pushed as chirps v0.0.5 https://github.com/agrobioinfoservices/chirps/commit/95d016fec74635290642454623f435ec6dbaa2c7

melvidoni commented 4 years ago

The first reviewer is @cvitolo. I'll fix the review deadline once we have both reviewers.

melvidoni commented 4 years ago

We got a second reviewer: @jzwart The review deadline is February 10th

cvitolo commented 4 years ago

Hello @kauedesousa!

This is a very useful package, I work in the field of weather forecasting and I greatly appreciate tools that provide an interface to data services. I found your package easy to use and well documented. All your hard work made my review very easy, thanks and great job!

I have some suggestions, which I listed below, I hope you find them useful :)

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [x] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing:


Review Comments

kauedesousa commented 4 years ago

Dear @cvitolo thank you so much for your comments and suggestions. It helped a lot! We have worked in incorporating the comments to chirps v0.0.6 https://github.com/agrobioinfoservices/chirps. Here we have included a point-to-point response to your comments:

general comments

README, it seems the code in your README file is only visualised but not executed. In this case, you could keep the README.md and remove README.Rmd (as this is basically redundant).

The file README.Rmd was removed

Your R folder contains a file called sysdata.rda. Is there a reason to keep these data there? Usually data should be placed under the root folder in a subfolder called data. I would suggest to move sysdata under the chirps/data folder, document the datasets (documentation is currently missing for both tapajos and tapajos_geom) and load the data using data("sysdata") or chirps::tapajos. If you make this change, please change the call to chirps:::tapajo in get_esi() accordingly.

The sf polygon is exported as 'tapajos', the sf POINT object is not necessary and can be generated in the examples with sf. Also, functions dataframe_to_geojson and sf_to_geojson are exported since they had the same issue using ::: in the examples and could be useful for the users

the man folder contains a figure folder. Is this good practice? I thought the man folder should be used only for documentation. Would it make sense to move the figure folder under inst, for instance?

The /man structure is part of using the pre-built vignettes. We not aware of any issues with it. CRAN hasn't baulked at it and Adam (one of the co-authors) just submitted two package updates in the last month that use this method

tests folder:

each of your test files contain a call to library(chirps). This is superflous because you load the package in testthat.R and that should suffice. My suggestion is to load all the packages needed by the tests in testthat.R and remove the commands library(package_name) in the individual test files.

DONE

When you call library() sometimes you use library(package_name), other times library("package_name"). I would suggest to consistently use the latter.

DONE

test-get_chirps.R: you only test that you get the right object class but do not test the returned values, is there a reason for that? In my experience, it is very important to test the correctness of the actual data and I would suggest you to develop tests for that.

We have updated the tests so it checks if the functions return the correct values. For this we downloaded the data from ClimateSERV and compared it with the ones retrieved by get_chirps, get_esi and precip_indices to validate it. The tests still have a skip_on_cran option as a CRAN policy. But we have opened an issue in the package repo and will keep it there until we figure out how to make 'vcr' works with 'chirps' https://github.com/agrobioinfoservices/chirps/issues/7

test-get_esi.R: as above, you only test that you get the right object class but do not test the returned values. I would suggest to also test the data values.

Same as above

test-precip_indicesl.R: the file name seems to contain an extra "l", should that not be test-precip_indices.R? you only test the dimensions of the output data frame but not the values themselves. As, above, I would suggest to also test the data values.

DONE

vignettes:

The file Overview.Rmd.orig seems redundant and can be removed.

We use this file to speed up the vignette creation, here Jeroen Ooms shows how it works https://ropensci.org/technotes/2019/12/08/precompute-vignettes/

It's not necessary to run twice the command precip_indices(dat, timeseries = TRUE, intervals = 15), the second one can be removed.

DONE

When running the command get_chirps, I get the following warning message: In st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle = endCapStyle,: st_buffer does not correctly buffer longitude/latitude data. Can this warning be eliminated? Maybe adding a note in the documentation?

These warning messages comes from 'sf', but we don't know if is a good practice to suppress that. We added a note to the documentation

When running the command get_chirps, I get the following warning message: Warning messages: 1: In st_centroid.sfc(x$geometry) : st_centroid does not give correct centroids for longitude/latitude data. 2: In st_centroid.sfc(x$geometry) : st_centroid does not give correct centroids for longitude/latitude data. Can this warning be eliminated? Maybe adding a note in the documentation?

Same as above

more in-depth discussion of the functionalities included in the package will make it easier for the reader to understand if the chirps dataset is suitable for a given purpose. I would also mention that requests may take a long time to be executed. Is it feasible to use these functions to download large amount of data (for instance to perform a global scale analysis)? In general, a mention of the limitations of this package would be valuable.

We added a section for the package limitations. And a better explanation about CHIRPS application into the paper. Also, W. Ashmall says here https://github.com/agrobioinfoservices/chirps/issues/12 that they are planning to upgrade the API service which will make it better to request queries to ClimateSERV.

LICENSE = GPL-3

when you use a widely known license you should not need to add a copy of the license to your repo. The files LICENSE and LICENSE.md are redundant and can be removed.

DONE

When I got my packages reviewed I was made aware that GPL-3 is a strongly protective license and, if you want your package to be used widely (also commercially), MIT or Apache licenses are more suitable. I just wanted to pass on this very valuable suggestion I received.

Thank you, we changed to MIT as suggested

inst/paper folder:

it seems the code in your paper is only visualised but not executed. In this case, you could keep the paper.md and remove paper.Rmd. Also paper.pdf could be removed.

DONE

Fig1.svg is redundant (Fig1.png is used for rendering the paper).

DONE

in the paper, I would move the introduction to the CHIRPS data at the beginning as readers might not be familiar with these data.

DONE

in the paper you use the command chirps:::tapajos to load data in your sysdata.rda. This is not good practice. The ::: operator should not be used as it exposes non-exported functionalities. If you move sysdata under the chrips/data folder (as suggested above), the dataset can be loaded using data("sysdata") or chirps::tapajos.

DONE

Towards the end of your paper you state: Overall, these indices proved to be an excellent proxy to evaluate the climate variability using precipitation data [@DeSousa2018], the effects of climate change [@Aguilar2005], crop modelling [@Kehel2016] and to define strategies for climate adaptation [@vanEtten2019]. Maybe you could expand a bit, perhaps on the link with crop modelling?

We updated this section with more examples, and hopefully a better explanation on CHIRPS applications and how chirps can help

goodpractice::goodpractice():

write unit tests for all functions, and all package code in general. 34% of code lines are covered by test cases. This differs from what is stated on GitHub (codecv badge = ~73% code coverage). The reason might be due to the fact you skip most of your tests on cran, is this because tests take too long to run? If so, is there a way you could modify the tests so that they take less time?

Same as above in the tests section

fix this R CMD check WARNING: Missing link or links in documentation object 'precip_indices.Rd': ‘[tidyr]{pivot_wider}’ See section 'Cross-references' in the 'Writing R Extensions' manual. Maybe you could substitute \code{\link[tidyr]{pivot_wider}} with \code{tidyr::pivot_wider()}.

The code was removed from seealso in the documentation.

Again, thank you for your time reviewing this package. We hope you like its new version.

melvidoni commented 4 years ago

Hello @jzwart! Could you submit your review for this package?

jzwart commented 4 years ago

Yes, I'm sorry for the delay but is it possible to get a week extension until Feb. 17 to submit my review?

melvidoni commented 4 years ago

Hello @jzwart yes, go ahead. But please, do not delay more than that! Is that ok, @kauedesousa ?

kauedesousa commented 4 years ago

Hi @melvidoni and @jzwart, yes it is ok for me.

melvidoni commented 4 years ago

Hello @jzwart is everything okay?

jzwart commented 4 years ago

Yes I'm currently working on review.

jzwart commented 4 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [x] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing:


Review Comments

I installed and reviewed chirps v0.0.6. See the comments above addressing the previous review that went into v0.0.6. The package is well-documented and easy to read. The authors have thoughtfully addressed all previous reviewer comments.

lonlat <- data.frame(lon = c(-55.0281,-54.9857, -55.0714), lat = c(-2.8094, -2.8756, -3.5279))

dates <- c("2017-01-01", "2017-01-03")

dat <- get_chirps(lonlat, dates) Error: Something went wrong with the query, no data were returned. Please see https://climateserv.servirglobal.net/ for potential server issues.

I visited the linked url and tried downloading some precip data using a custom polygon and there was no response for the ~20 minutes I waited for data to return. I'm not sure if this is a common issue with this server, but I don't feel like I can finish the review until the server is working and returns data. 
I'm also not sure how I should know if there are potential server issues when I visit https://climateserv.servirglobal.net . Providing more detailed error message would help with debugging or knowing what is going on. 
--- Updated 2020-02-18 ---- 
- See comments below for improvements to the server which was causing the error I was getting yesterday. `get_chirps()` worked as expected earlier this morning and I could calculate the precipitation indices outlined in the example script in the paper.  
- Is there a recommended timeout for the data request? Although `get_chirps()` worked earlier this morning, it seems like the server is down again, and I get the following output with the paper's example script: 

dt <- get_chirps(tp, dates = c("2008-01-01","2018-01-31")) dist is assumed to be in decimal degrees (arc_degrees). Getting your request...

0%... 0%... 0%... 0%... 0%... 0%... 0%... 0%... 0%... 0%... 0%...


It's pretty clear that the server is down, but if a user does not know that the server is down, would you expect multiple `0%...` data progress from big data calls? Would this data call timeout if I left it going when the server is down? A little more guidance on the call request time would help the user debug in these instances. 

As long as the server is up, the package seems to work well. I just have a few comments to help the user in debugging instances above ^. 
jzwart commented 4 years ago

The data call request seems to be working this morning, I'll finish my review and update my comment above

kauedesousa commented 4 years ago

Hi @jzwart I was informed by the developers of ClimateSERV that they are doing some improvement in the server. I just tried using both chirps and browser and it is still down for me right now. Let me know if you manage to review the package.

Otherwise what should we do @melvidoni ?

jzwart commented 4 years ago

Thanks @kauedesousa , get_chirps was working for me a couple mins ago (but now down when I tried again). I'll try to finish the rest of the review now.

jzwart commented 4 years ago

@kauedesousa and @melvidoni, I've updated my review above. Although I couldn't finish all instances of the tests, I'm confident that they'd work if the server was working. I just have some comments on improvements for debugging and description of best uses / limitations for the package.

kauedesousa commented 4 years ago

Hi @melvidoni I am a bit busy at work and I think I will not be able to send a revised version of chirps before March-10. Is that ok?

melvidoni commented 4 years ago

Hello @kauedesousa. That is perfectly fine, take your time. Please advise if you are going to need more time, so I can apply the correct labels.

noamross commented 4 years ago

⚠️⚠️⚠️⚠️⚠️

In the interest of reducing load on reviewers and editors as we manage the COVID-19 crisis, rOpenSci is temporarily pausing new submissions for software peer review for 30 days (and possibly longer). Please check back here again after 17 April for updates.

In this period new submissions will not be handled, nor new reviewers assigned. Reviews and responses to reviews will be handled on a 'best effort' basis, but no follow-up reminders will be sent.

Other rOpenSci community activities continue. We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other.

The rOpenSci Editorial Board

⚠️⚠️⚠️⚠️⚠️

annakrystalli commented 4 years ago

⚠️⚠️⚠️⚠️⚠️ In the interest of reducing load on reviewers and editors as we manage the
COVID-19 crisis, rOpenSci new submissions for software peer review are paused.

In this period new submissions will not be handled, nor new reviewers assigned.
Reviews and responses to reviews will be handled on a 'best effort' basis, but
no follow-up reminders will be sent. Other rOpenSci community activities continue.

Please check back here again after 25 May when we will be announcing plans to slowly start back up.

We express our continued great
appreciation for the work of our authors and reviewers. Stay healthy and take
care of one other.

The rOpenSci Editorial Board ⚠️⚠️⚠️⚠️⚠️

kauedesousa commented 4 years ago

Dear all, I hope you are doing fine during this crisis. We managed to address the comments from our reviewers and hereby we submit the revised version of chirps in its version 0.0.8.

All the best

general comments

Could you clarify what 'near-present' means in the Overview section of the README.md?

Is the present date. But normally there is a 45 day lag. We added this at the Overview and paper. "... and ranging from 1981 to near-present (normally with a 45 day lag)"

Can you add more details for data return performance and what the user should expect for big data calls in the README? Is the package most useful for a few point data returns or can the user get a continental data call returned in a reasonable amount of time? I see some description of limitations in the Overview vignette, but I think the best uses for the chirps dataset, package, and any limitations should be upfront so a potential user can decide to install the package before finding a limitation description in the vignette. I could not run the example code successfully. I let the example code in the README run for about 20 min before it timed out and returned an error.

The example in the README should go fast (~ 45 s), but the server was experiencing some issues. In any case, if the number of points is scaled (e.g. globaly) this may increase the time for data retrieval significantly. We added this message in the pkg DESCRIPTION. "Requests from large time series (> 10 years) and large geographic coverage (global scale) may take several minutes."

Is there a recommended timeout for the data request? Although get_chirps() worked earlier this morning, it seems like the server is down again, and I get the following output with the paper's example script:

By the time of your review the developers of ClimaSERV were working in some improvements in the server. The requests should work fine now, but with the limitations stated in the DESCRIPTION file.

It's pretty clear that the server is down, but if a user does not know that the server is down, would you expect multiple 0%... data progress from big data calls? Would this data call timeout if I left it going when the server is down? A little more guidance on the call request time would help the user debug in these instances.

This message was removed from the .GET() function. Now the function tries to get the data 6 times (six tries), if no data is returned than an error message is provided saying that the server is down.

melvidoni commented 4 years ago

Hello @kauedesousa, thanks for the updates!

@cvitolo @jzwart I hope you are safe. Can you please review the changes made by the author?

jzwart commented 4 years ago

Hi @melvidoni and @kauedesousa, Yes, I can review the changes today.

cvitolo commented 4 years ago

Hi @melvidoni, I’m very happy with the revised version. All my suggestions have been addressed.

@kauedesousa great job!

jzwart commented 4 years ago

Nice job on this package, @melvidoni, and thank you for the careful consideration of all the reviews. I find the revised package very easy to use and the descriptions in the vignettes & README useful. Thank you for adding in the time out for .GET() - I know that's probably a rare case with server issues, but I think adds to the functionality of the package. All my suggestions have been addressed.

melvidoni commented 4 years ago

Approved! Thanks @kauedesousa for submitting and @cvitolo @jzwart for your reviews!

To-dos:

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

kauedesousa commented 4 years ago

Thanks @melvidoni ! I will follow the to do list. Thanks @cvitolo and @jzwart for the helpful comments and suggestions. Would you like to be acknowledged in the pkg Description with the "rev" role?

melvidoni commented 4 years ago

Great! Please, @kauedesousa tag me if you need any assistance.

kauedesousa commented 4 years ago

I already need. Can you assign the repo to the team chirps that you set up? I skipped this step and now I can't go into the repo settings.

melvidoni commented 4 years ago

Hello there! I was just trying to add the team, please be patient. It should be done now, and @adamhsparks should also have access.

jzwart commented 4 years ago

@kauedesousa, no need to acknowledge me as a reviewer in the package description, but thank you for asking!

cvitolo commented 4 years ago

@kauedesousa, congratulations on getting your package approved! No need to acknowledge me as reviewer either but I would double check with @melvidoni if rOpenSci advises otherwise.

kauedesousa commented 4 years ago

Hi @melvidoni , the to-do list is completed. Let me know if there is something else to do. My next step is submit this to CRAN.

melvidoni commented 4 years ago

Good luck!