ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

Submission: smapr #231

Closed mbjoseph closed 6 years ago

mbjoseph commented 6 years ago

Summary

The smapr package discovers, downloads, and extracts global soil moisture data from the NASA SMAP mission, producing raster objects from HDF5 files.

Package: smapr
Type: Package
Title: Acquisition and Processing of NASA Soil Moisture Active-Passive (SMAP) Data
Version: 0.1.2
Authors@R: c(
    person("Maxwell", "Joseph", email = "maxwell.b.joseph@colorado.edu",
            role = c("aut", "cre")),
    person("Matthew", "Oakley", email = "matthew.oakley@colorado.edu",
            role = "aut"),
    person("Zachary", "Schira", email = "zasc3143@colorado.edu",
            role = "aut")
    )
Depends:
    R (>= 3.2.5)
Imports:
    httr (>= 1.1.0),
    rappdirs (>= 0.3.1),
    raster (>= 2.5),
    rhdf5 (>= 2.14),
    rvest,
    xml2
Maintainer: Maxwell Joseph <maxwell.b.joseph@colorado.edu>
Description:
    Facilitates programmatic access to NASA Soil Moisture Active
    Passive (SMAP) data with R. It includes functions to search for, acquire,
    and extract SMAP data.
License: GPL-3
LazyData: TRUE
RoxygenNote: 6.0.1
Suggests:
    knitr,
    rgdal,
    rmarkdown,
    roxygen2,
    testthat,
    utils
VignetteBuilder: knitr

[e.g., "data extraction, because the package parses a scientific data file format"]

data retrieval and munging - smapr finds and downloads data, and extracts gridded spatial from HDF5 files with the correct projection information.

Soil scientists, ecologists, and people working in the remote sensing domain could use the package to quantify changes in soil moisture, detect droughts, generate covariate layers for species distribution models, and validate other soil moisture data products.

There is another smapr package, but it does not support the breadth of SMAP data products that this package does, and development seems to have ceased ~3 years ago: https://github.com/strongh/smapr. This smapr package has more thorough documentation and tests, allows extraction of multiple soil moisture data products, and has been on CRAN since 2016.

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

Anyone with experience in programmatically finding and getting data from a https server, and/or working with HDF5 raster data would be great - perhaps:

noamross commented 6 years ago

Thanks for you submission, @mbjoseph! This fits well within our scope.

Editor checks:


Editor comments

goodpractice::gp() checks are pretty clean! Please add relevant links to your description and check that long code lines are neccessary:

── GP smapr ────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code in
    general. 6% of code lines are covered by test cases.

    R/download_smap.R:32:NA
    R/download_smap.R:33:NA
    R/download_smap.R:34:NA
    R/download_smap.R:35:NA
    R/download_smap.R:36:NA
    ... and 292 more lines

  ✖ add a "URL" field to DESCRIPTION. It helps users find information
    about your package online. If your package does not have a
    homepage, add an URL to GitHub, or the CRAN package package page.
  ✖ add a "BugReports" field to DESCRIPTION, and point it to a bug
    tracker. Many online code hosting services provide bug trackers for
    free, https://github.com, https://gitlab.com, etc.
  ✖ avoid long code lines, it is bad for readability. Also, many people
    prefer editor windows that are about 80 characters wide. Try make
    your lines shorter than 80 characters

    tests/testthat/test-extract_smap.R:117:1
    tests/testthat/test-find_smap.R:10:1
    tests/testthat/test-find_smap.R:20:1

───────────────────────────────────────────────

One real spelling error from spelling::spell_check_package()

 WORD                 FOUND IN
availble             smapr-intro.Rmd:36

A note: There's a poorly documented CRAN feature for including bioconductor packages using a biocViews: field in DESCRIPTION, which you should add to ease installation for users. See here.

I'll go ahead and seek reviewers.


Reviewers: @ldecicco-USGS @marcosci Due date: 2008-07-19

noamross commented 6 years ago

Reviewers @ldecicco-USGS and @marcosci assigned. Due date 2008-07-19. Note our review guidelines have recently been consolidated in our development guide.

noamross commented 6 years ago

@mbjoseph You can now add a review badge to your package README:

[![](https://badges.ropensci.org/231_status.svg)](https://github.com/ropensci/onboarding/issues/231)
mbjoseph commented 6 years ago

Great - thanks Noam!

noamross commented 6 years ago

Hi @marcosci and @ldecicco-USGS! Just a friendly reminder that your reviews are due in 1 week.

marcosci commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing:

2.5-3 hrs


Review Comments

General comments

I really like smapr - great work! It was a very pleasant review, the package works like a charm and did exactly what it promised. I see clear value in the package and think it will be a great contribution to the geospatial ecoystem of rOpenSci.

The package documentation is comprehensive and well written, the code is neat and the authors provide rigorous unit testing. Overall, the package is in a good shape, so the following comments are addressing minor things that could be improved.

1. Installation

2. Documentation

General
README
pkgdown

Is there a reason why don't use it? It wouldn't be a necessity from my side, but I think it's worth the effort as navigating smapr would be way easier than through the pdf documentation.

Vignette

While you provide good examples on how to use the functions in smapr, I still think it would be worth to give some examples/links/... on how to further work with the data retrieved from your package.

I can just speak for the field I am working in, but I think a lot of people there would like to crop the resulting data in the end. So maybe start something in the line of:

us <- raster::getData("GADM", country="USA", level=1)
california <- subset(us, NAME_1 == "California")
california <- sp::spTransform(california, proj4string(sm_raster))
sm_crop <- raster::crop(sm_raster, raster::extent(california))
sm_mask <- raster::mask(sm_crop , california)
raster::plot(sm_mask)

I would consider this whole point a bonus, but when I think back starting with spatial data in R was a lot of googling how to reproject, crop, etc.

3. Code

.Rprofile

Something for lazy people like me:

set_credentials <- function(ed_un, ed_pw){
  write(paste0("Sys.setenv(ed_un = \"",ed_un,"\", ed_pw = \"", ed_pw, "\")"),
        file=file.path(Sys.getenv("HOME"), ".Rprofile"),append=TRUE)
}

An auxiliary function like that would help to simplify the need for setting the oauth credentials. Plus, new users don't have to locate .Rprofile and/or one can set it easily on remote machines.

4. Future proofing

Just to bring that on your radar: https://github.com/r-spatial/stars/. If it hits CRAN I think this would be a valuable replacement for rhdf5 and raster in your package, as it could streamline the whole handling process of SMAP data in general.

This also came up in our submission here and I only found an issue discussing sf over sp for packages onboarding rOpenSci. But in your case I think this is more relevant than it was for us.

5. Smaller ToDos

To save @noamross some work:

6. Conclusion

That's a rather short review :open_mouth: ... I would blame the quality of the package :wink: Going through the reviewing guidelines, there is nothing I haven't stated here that could be improved. The code style is very clear, there are no duplications and getting to know the package as a user was pleasant. Other than donating money to NASA for faster servers and hence faster downloading speed, I do not see any possibilities to improve the speed of the functions. The functions and variables all follow rOpenSci guidelines.

noamross commented 6 years ago

Thanks for your review, @marcosci! @mbjoseph, feel free to wait for a second review before addressing comments.

ldecicco-USGS commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

The description is clear of what the package does. It wouldn't hurt to add a sentence or two right at the beginning expanding on the type of data and type of processing.

My eyes swept over the Bioconductor instructions, so at first, it took me a bit to figure out how to install the rhdf5 package. Could you add the code directly?

source("https://bioconductor.org/biocLite.R")
biocLite("rhdf5")

Before Finding the data, it would probably be a good idea to have a short section on going to the NASA page to get a user name and password. The error message however was very useful, so I knew exactly what to do.

When I ran:

available_data <- find_smap(id = 'SPL4SMAU', dates = '2018-06-01', version = 3)
Error in rvest::html_table(nodes)[[1]] : subscript out of bounds

Debuging, I got to this line:

top_level_response <- GET(path, auth())

and it appeared to be a 401. I eventually realized I had typed the wrong password. I think you could check on that response and make the error message there more clear. Once I got that straighted up, it was off to the races...I thought.

The download_smap function took a long time. I'd add a quick sentence in the vignette to give the user an idea on how long it might take (or how big the files are). Any indicator messages would be nice here too (am I stuck? did it start? did it get through some of those available_data?).

The README explains what id to use, but the vignette does not. I'm not a big fan of copy/paste, but it's easier to access the vignette once I'm already in R (compared to the readme). Could you add that "Dataset id" table (with the id's) so I can think about looking for more data?

The vignette provided exactly the information needed to run the functions in the package. That being said...I was really curious what I could do next. Could you show how to filter the data into some region?

In the find_smap arguments, "id" gives one example. It would be useful to say something like "See Details for a description of all ids" (or something like that....)

The examples worked!

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 2


Review Comments

Nice package!

Do you have a connection to the data? I'm going to guess you'll get lots of questions on the data itself and why the services are down (if they go down!). Those requests usually are phrased "smapr is broken!!!!".

No matter what your relationship is, it would be a good idea to make sure you the data providers know this package is out there. They may be interested in having you add a "user agent" for example, so they (and you) can get an idea of how often data is downloaded from the smapr package.

Otherwise, the code looks nice, the style is consistent, the test coverage is excellent.

mbjoseph commented 6 years ago

Thanks @marcosci and @ldecicco-USGS for your reviews! This is all great feedback.

noamross commented 6 years ago

Thanks your review @ldecicco-USGS! @mbjoseph, if you use issues in your own repo to track how you address the reviews, please summarize the changes here when you are done.

mbjoseph commented 6 years ago

Will do @noamross - thanks!

mbjoseph commented 6 years ago

Thanks again @marcosci, @ldecicco-USGS, and @noamross for your time and reviews. Here are the point-by-point changes:

Review 1 from @marcosci

1. Installation

2. Documentation

3. Code

4. Future proofing

Thanks for pointing out the stars package! I'll keep my eye on it as a potential replacement for rhdf5 and raster.

5. Smaller ToDos

Review 2 from @ldecicco-USGS

Documentation

My eyes swept over the Bioconductor instructions, so at first, it took me a bit to figure out how to install the rhdf5 package. Could you add the code directly?

Shoot! Fortunately this shouldn't be needed anymore with the biocViews() edit that @noamross suggested.

Do you have a connection to the data?

Not really - other that we work in the same city! The National Snow and Ice Data Center (which hosts the data) is aware of the package, however.

mbjoseph commented 6 years ago

Sorry - just accidentally hit "Close and comment"! The above summarizes the changes that were made. I really appreciate everyone's help with this - the package is much improved now thanks to these detailed (and friendly :smiley:) reviews!

noamross commented 6 years ago

Thanks, @mbjoseph! @marcosci and @ldecicco-USGS, please let us know if Max's changes address your comments.

ldecicco-USGS commented 6 years ago

They sure do. Great job @mbjoseph !

marcosci commented 6 years ago

Superb @mbjoseph ! Just had a look at everything, looks perfect! Well done.

noamross commented 6 years ago

Quick note from a final set of package checks, @mbjoseph: Please add .github to your .Rbuildignore

Once you've done that, approved! Thanks for submitting and @marcosci and @ldecicco-USGS for your reviews!

To-dos:

Should you want to awknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/). If you are interested, @stefaniebutland will be in touch about content and timing.

We've started putting together a gitbook with our best practices and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

mbjoseph commented 6 years ago

Great - thanks @noamross! I have updated the .Rbuildignore file (https://github.com/ropensci/smapr/commit/9a6e31c1774120dda193229ed94ef4dcbcac0257), transferred the repo to the ropensci org, added the footer (https://github.com/ropensci/smapr/commit/a8f9198ba70aac7a0e69ad12e096d702bdea2b5f), updated badges (https://github.com/ropensci/smapr/commit/acfe30056817bbb562d5e728dccc3aaab63e9b0b), and updated the Codemeta file (https://github.com/ropensci/smapr/commit/dbf3db6735300a516464022ab12ca23000bcfcad).

I'd love to add @ldecicco-USGS and @marcosci as reviewers if they are comfortable with it.

I'd also be happy to write a blog post about smapr, and I'll take a look at the onboarding section of the gitbook!

noamross commented 6 years ago

Excellent. @stefaniebutland will be in touch about the blog schedule, in the meantime instructions for putting together post are here. https://github.com/ropensci/roweb2/blob/master/readme.md. I'm closing the issue but feel free to carry on that conversation here.

I've made you admin of the repo over the the rOpenSci org. Your other collaborators have write permissions. Feel free to increase their permission levels if needed be.

stefaniebutland commented 6 years ago

Hello @mbjoseph. Glad to hear you're interested in contributing a post. Our hope is that it gets more eyes on your work!

This link will give you many examples of blog posts by authors of onboarded packages so you can get an idea of the style and length you prefer: https://ropensci.org/tags/review/. Technotes are here: https://ropensci.org/technotes/.

As Noam said, here are some technical and editorial guidelines for contributing a blog post: https://github.com/ropensci/roweb2#contributing-a-blog-post. We ask that you submit your draft post via pull request a week before the planned publication date so we can give you some feedback.

At this point I have open publication slots in September. Do you want to pick a date for draft submission?

Happy to answer any questions.

mbjoseph commented 6 years ago

Heya @stefaniebutland - September sounds great! How about Sep 15 deadline to submit a draft?

stefaniebutland commented 6 years ago

draft Sep 15 sounds great. I'll mark my calendar to ping you here then. Don't hesitate to ask questions here, or on Slack now that you're there :-)

stefaniebutland commented 6 years ago

@mbjoseph Are you still good with a Sep 15 draft for publication 2018-09-25?

mbjoseph commented 6 years ago

@stefaniebutland yes - thanks for checking!