potterzot commented 5 years ago

Submitting Author: Nicholas A. Potter (@potterzot)
Repository: https://github.com/potterzot/rnassqs Version submitted: 0.4.0 Editor: @lmullen
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD

Paste the full DESCRIPTION file inside a code block below:

Package: rnassqs
Type: Package
Title: Access the NASS 'Quick Stats' API
Version: 0.4.0.9000
Date: 2019-04-29
Authors@R: c(
  person('Nicholas', 'Potter', email='potter.nicholas@gmail.com', role = c('aut', 'cre')),
  person('Joseph', 'Stachelek', email='', role = c('ctb')),
  person('Julia', 'Piaskowski', email='', role = c('ctb'))) 
Maintainer: Nicholas Potter <potter.nicholas@gmail.com>
Description: Interface to access data via the United States Department of 
  Agricultre's National Agricultural Statistical Service (NASS) 'Quick Stats' 
  web API <https://quickstats.nass.usda.gov/api>. Convenience functions 
  facilitate building queries based on available parameters and valid parameter 
  values.
URL: https://github.com/potterzot/rnassqs
BugReports: http://www.github.com/potterzot/rnassqs/issues
License: MIT + file LICENSE
LazyData: TRUE
Language: en-US
Imports:
  httr,
  jsonlite,
Suggests:
  testthat,
  here,
  knitr,
  rmarkdown
RoxygenNote: 6.1.1
Encoding: UTF-8
VignetteBuilder: knitr

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [X] data retrieval
- [ ] data extraction
- [ ] database access
- [ ] data munging
- [ ] data deposition
- [ ] reproducibility
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

Data retrieval because 'rnassqs' allows access the the NASS 'Quick Stats' API to fetch data.

Who is the target audience and what are scientific applications of this package?

Target audience is those who want to automate or reproducibly fetch data from 'Quick Stats', including agronomists, economists, and others working with agricultural data. Scientific applications include analysis of agricultural data by administrative region (e.g. county, state, watershed), economic analysis of policies that affect agriculture, and sociological/demographic analysis of agricultural producers over time.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

None that I have been able to find.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

297, responded to by @noamross

Technical checks

Confirm each of the following by checking the box. This package:

[X] does not violate the Terms of Service of any service it interacts with.
[X] has a CRAN and OSI accepted license.
[X] contains a README with instructions for installing the development version.
[X] includes documentation with examples for all functions.
[X] contains a vignette with examples of its essential functions and uses.
[X] has a test suite.
[X] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[X] Do you intend for this package to go on CRAN?
[X] Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Options

- [X] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [X] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [X] The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.2662520 - (*Do not submit your package separately to JOSS*)

[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[X] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

lmullen commented 5 years ago

@potterzot I will be the editor for this peer review.

Here are my editorial checks.

Editor checks:

[x] Fit: The package meets criteria for fit and overlap
[x] Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
[x] License: The package has a CRAN or OSI accepted license
[x] Repository: The repository link resolves correctly
[ ] Archive (JOSS only, may be post-review): The repository DOI resolves correctly
[ ] Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

In running R CMD check I get the following:

❯ checking top-level files ... NOTE
  Non-standard files/directories found at top level:
    ‘paper.bib’ ‘paper.md’

[ ] Please either add those files too .Rbuildignore or put them in inst/.
[ ] In paper.md there are a few typos, including these. You might look over the paper one more time as you correct these.
"be one of the most difficult stages of research make reproducible" should be "to make"
"explicitly constructing html GET requests" should be "HTTP GET requests." (HTTP is the protocol that often, but not usually in the case of an API like this one, returns HTML.)
[ ] Could you clarify, please, what is tested when the user has not provided an API key? For tests that you are skipping if no API key is available, is it possible to stub or mock the API? The testthat package includes some mocking functions, as well as testthat::expect_equal_to_reference() which can be used for that purpose.
[ ] After running spelling::spell_check_package() I note a number of misspellings. Please run that on the package yourself and correct the actual misspellings.
[ ] After running goodpractice::gp() there is this change needed to the DESCRIPTION:

✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite
    often. A build date will be added to the package when you perform `R CMD build` on
    it.

I am in the process of looking for peer reviewers.

Reviewers: Due date:

potterzot commented 5 years ago

@lmullen thank you for your comments. I've made a commit to address your comments.

Regarding this note:

Could you clarify, please, what is tested when the user has not provided an API key? For tests that you are skipping if no API key is available, is it possible to stub or mock the API? The testthat package includes some mocking functions, as well as testthat::expect_equal_to_reference() which can be used for that purpose.

Tests in tests/testthat/test-oncran.R beginning on line 55 include mock API calls. The tests make the request, specifying that the function return the GET request URL rather than actually make the request, and that request is compared to the correct URL. There are three API paths to test:

__api_GET__: tested in test_that("nass_GET forms a correct URL", ...)
get_param_values: tested in test_that("nassqs_param_values forms a correct URL", ...)
__get_counts__: tested in test_that("nassqs_record_county forms a correct URL", ...)

There are additional mock API tests that follow those, but those are for convenience functions for making specific requests, e.g. nassqs_area and nassqs_yield, which wrap nassqs_GET.

Tests in tests/testthat/test-local.R make actual API calls using an API key, and are not possible on CRAN.

Is there a better way of organizing tests that makes it clear where the API mock tests are done and where the actual API call tests are done?

lmullen commented 5 years ago

@potterzot That sounds fine to me. I just wanted to make sure the reviewers and I understood.

lmullen commented 5 years ago

@potterzot Apologies for the delay in getting this review going. One person has agreed to review but a string of others have been unavailable at the start of the summer. Still looking for that second reviewer and then the review will begin.

lmullen commented 5 years ago

Thanks to our reviewers for agreeing to take on this package.

Reviewer: @adamhsparks Reviewer: @nealrichardson Due date: 2019-07-11

You can find the guide for reviewers here. Please let me know if you have any questions.

adamhsparks commented 5 years ago

I know I'm behind. Sorry, I've had a rather busy time lately. I'm starting the review today and will see how I go.

adamhsparks commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions in R help
[x] Examples for all exported functions in R Help that run successfully locally
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be auto-generated via Authors@R).

For packages co-submitting to JOSS

[ ] The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software

[ ] Authors: A list of authors with their affiliations

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.

[ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

[x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 7

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

This package offers great functionality and defines very nicely why it's necessary. I'm happy to see this sort of package being written.

Following are my comments.

Spell check the package, in particular the DESCRIPTION file's Line 12, Agricultre
It's a bit of opsec, but I think the README and vignette should be more explicit about keeping the API keys secret and not embedding them in scripts. Along with this, highlighting why it's useful to use the .Renviron file for this purpose would be nice (you don't have to worry about adding it to .gitignore for one). For new users just how to modify the .Renviron is likely to be confusing at best and frustrating at worst. Perhaps it would be nice to include an example that shows how to use usethis::edit_r_environ() for this purpose to streamline the illustrations and examples.
The DESCRIPTION file's description text might need " ' " around "API". I had to do this for the nasapower package to have it accepted on CRAN.
I always encourage including a proper CITATION file in /inst, see https://github.com/ropensci/nasapower/blob/master/inst/CITATION for an example
The paper.md file has Git conflicts in it that need to be resolved
There is no statement about contributions or a code-of-conduct present
I feel a bit like the functionality for the end-user is a bit overly complicated. This sort of package to me should just return data in a data.frame or list or some other R object that I can easily work with. In some cases I get a server response, in others I can ask for raw JSON or other file formats. I don't really see a good reason for this in R. I guess the functionality could be offered, but I think it needs to be hidden under a few layers for more advanced users that might want to use it. Rather it should just be a simple query for the data that I want and returning it in a format that I can readily use in R without extra steps. Just fetch the data and format it for me in one single step. I think it's possible with this package, but the documentation is a bit convoluted and not clear to me.

Comments on Vignette

When I try using the nassqs_token() I receive and error message.

> library(rnassqs)
> nassqs_token()
Error in nassqs_token() : could not find function "nassqs_token"

This example works as expected,

nassqs_fields()
[1] "agg_level_desc"        "asd_code"             
 [3] "asd_desc"              "begin_code"           
 [5] "class_desc"            "commodity_desc"       
 [7] "congr_district_code"   "country_code"         
 [9] "country_name"          "county_ansi"          
[11] "county_code"           "county_name"          
[13] "CV"                    "domaincat_desc"       
[15] "domain_desc"           "end_code"             
[17] "freq_desc"             "group_desc"           
[19] "load_time"             "location_desc"        
[21] "prodn_practice_desc"   "reference_period_desc"
[23] "region_desc"           "sector_desc"          
[25] "short_desc"            "state_alpha"          
[27] "state_ansi"            "state_name"           
[29] "state_fips_code"       "statisticcat_desc"    
[31] "source_desc"           "unit_desc"            
[33] "util_practice_desc"    "Value"                
[35] "watershed_code"        "watershed_desc"       
[37] "week_ending"           "year"                 
[39] "zip_5"

however,

?nassqs_fields()

returns a help file that says,

Deprecated: Return list of NASS QS parameters.
Description
Deprecated. Use nassqs_params() instead.

Suggest updating vignette to match most recent functionality.

Another error occurs with this example from the vignette.

> rnassqs::nassqs_field_values(field = 'unit_desc')
Error: 'nassqs_field_values' is not an exported object from 'namespace:rnassqs'

The "All together" section script is not functional.

fields <- nassq_fields()
Error in nassq_fields() : could not find function "nassq_fields"

Comments on functions

Function names are not consistent with GET being all caps but parse and check not. In the vignette text nassqs_parse() is referred to as PARSE in all caps. Consistency in function naming will help the end user.
It is entirely up to the package authors how to organise the functions, but I find the current structure confusing at best. Typically when I see a file with just the package name, it is just there to provide the help file for the package with author information, references and other basic info. The nassqs.R file has several functions in it for the package along with release_questions(), which I find to be extremely odd. This function is not something that should be in the package and exposed to end-users.

My suggestion is remove release_questions() entirely and split out the functions into their own files with the function name being the file-name. This helps make it easier to keep the functions organised and updated. Splitting the functions out is entirely up to the authors if they wish to implement this structure, however I feel that removing release_questions() is necessary.

Why is base_URL given as a parameter that the user can modify? The documentation even says it "probably" should not be changed. I would just hard-code it and not give the user any possibility of changing it. I can't see any good reason for doing this. If the URL changes, then the package should be updated to reflect the changes.

Comments on documentation

When packages are mentioned in the documentation, wrap them in \pkg{httr} for proper formatting to indicate that you are referring to a package. Likewise, R can be written as \R for special formatting.
The text \code{jsonlite::fromJSON} should be written as \code{\link[jsonlite]{fromJSON}} so that it links to the help file for this function.
Titles for function help files should be written in title case.
All exported functions should have examples in the documentation, nassqs_check() does not have any examples.
The example for nassqs_param_values() is commented out making it difficult to follow. Examples should not be commented out and should be clear an easy to follow.

  # See all values available for the statisticcat_desc field. Values may not
  # be available in the context of other parameters you set, for example
  # a given state may not have any 'YIElD' in blueberries if they don't grow
  # blueberries in that state.
  # Requires an API key:
  #nassqs_param_values("statisticcat_desc", key = "my api key")

Should appear as

  # See all values available for the statisticcat_desc field. Values may not
  # be available in the context of other parameters you set, for example
  # a given state may not have any 'YIElD' in blueberries if they don't grow
  # blueberries in that state.
  # Requires an API key:

  nassqs_param_values("statisticcat_desc", key = "my api key")

I find the documentation for nassqs_GET() confusing. I don't understand the first example.

> params = list(commodity_name="CORN", 
+               year=2012, 
+               agg_level_desc = "STATE",
+               state_alpha = "WA",
+               statisticcat_desc = "YIELD")
> nassqs_GET(params)
Response [https://quickstats.nass.usda.gov/api/api_GET?key=XXXXXXXXXXXXXXXXXXXXXXX&commodity_name=CORN&year=2012&agg_level_desc=STATE&state_alpha=WA&statisticcat_desc=YIELD&format=JSON]
  Date: 2019-07-13 04:48
  Status: 200
  Content-Type: application/json
  Size: 148 kB

What do I do with this response value? How is this response yields for corn in 2012 in Washington?

Does the end-user even need to interface with this function or should it be hidden and used by the other functions in the package that return data in data.frames or other R objects?

nassqs_params() lacks examples
The example for nassqs_parse() could be easier to follow.

# Set parameters and make the request
params = list(commodity_name="CORN", 
              year=2012, 
              agg_level_desc = "STATE",
              state_alpha = "WA",
              statisticcat_desc = "YIELD")
req <- nassqs_GET(params)
nassqs_parse(req, as = "data.frame")

would be more clear as

# Set parameters and make the request
params <- list(
  commodity_name = "CORN",
  year = 2012,
  agg_level_desc = "STATE",
  state_alpha = "WA",
  statisticcat_desc = "YIELD"
)
req <- nassqs_GET(params)
corn <- nassqs_parse(req, as = "data.frame")
head(corn)

nassqs_parse() as is unclear. It states it indicates the data type returned, but doesn't list the data types aside from in the usage section, which indicates that a list is possible, but this doesn't appear to be documented? The @return section says a data frame or raw text of the content from the request.
Any functions that query an external server for data and may fail or take an extended period of time to run should have the examples wrapped in a \donttest{} for CRAN but still allowing for local testing. I see most examples are wrapped but not all that run an external query.

Comments on code style

In most cases the code is clearly written, in some cases the style is inconsistent with lack of spaces around a =, e.g. Line 73 of nassqs.R. This also applies in the examples for documentation. Also, single and double quotes are used interchangeably. I find it easier to follow if only one style is used in all cases as there are cases where single quotes only may be used and so forth and so on.
The operator used to assign in the examples also switches between = and <-. Only one should be used consistently.
Wrap code at 80 characters for ease of reading and for those of us that don't have editor windows that expand beyond 80 chars wide.

nealrichardson commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions in R help
[x] Examples for all exported functions in R Help that run successfully locally
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

[x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Looks like I get to be the infamous "Reviewer 2" this time :)

Since Adam dutifully got his review in on time and I'm delayed a couple of days, I'll say at the start that I agree generally with all of his points (with one minor exception, noted below), and in the interest of brevity, I've tried to avoid reiterating comments he made.

I greatly appreciate packages like this and the effort that goes into making them. This is the kind of package I wish existed when I was doing my dissertation--would have made at least the data retrieval and management part of research a lot cleaner. So, thank you for your contribution.

Style and code-hygiene comments aside, my main suggestion to you is to think about how you can orient the package towards the R user and their needs. The purpose of this package should be to encapsulate your hard-earned knowledge of how this API works and enable data scientists to access that data as naturally as possible. However, the package currently seems to expose an R interface centered around the needs of the API and not the R user. For example, the functions that seem to make up the public interface for the package are mostly stuck in "helpers.R", as if they're ancillary when really they should be front and center.

Here's a more concrete example: reading the vignette discussion about the comparison operators, it looks like the way to get data on corn production in Virginia since 2012 is

nassqs(list(commodity_desc="CORN", year__GE=2012, state_alpha="VA")))

But when I think about the R code that I want to type to get that data, it looks more like

nassqs(commodity == "corn", state == "va", year >= 2012)

That is, not case-sensitive, handles the different ways commodities and states (for example) could be referenced (integer code, abbreviation, long name), and more naturally handles the comparison operators (can translate year >= 2012 to year__GE=2012). You could go even farther and have a dplyr syntax where you can filter() rows and select() columns, so that the R user is essentially describing the data.frame they want to get, and your package figures out how to turn that into API requests, handle pagination, etc., and return the shape of data that an R user expects.

I'm not saying you need to do exactly this/all of this now (though some things, like case insensitvity, would be easy enough), but more to suggest ideas for the future and to show what I mean in terms of thinking of the R interface and not just what the API demands. When I wrap APIs (and in general write packages), I like to start by thinking about the R code I want (users) to type, and that usually means doing more work in the package to mask the awkwardness of the underlying HTTP API.

Another way to get more R-user-centric would be to improve the documentation, which I found to be thin, often tautological (@param api_path the API path), and not helpful for an inexperienced user. Documentation is a bit of work but it's worth doing, especially for how it can help focus you on what the intended user needs to know in order to get value from the package--and how you can make what details they need to know as minimal as possible.

A good way to see what the docs will look like to your users is to use pkgdown to build the package website. You don't need to publish this (though it's nice if you do), but I've found that even doing it locally makes it really visible to me where my documentation is not the most helpful or beautiful.

Observations from using the package

I got a token and did the example on the readme (corn yields in Virginia and Pennsylvania for since 2000).

> library(rnassqs)
> Sys.setenv(NASSQS_TOKEN="REDACTED")
> params <- list("commodity_desc"="CORN",
+                   "year__GE"=2000,
+                   "state_alpha"=c("VA", "PA"),
+                   "statisticcat_desc"="YIELD")
> df <- nassqs(params)

That all worked as expected. That said,

It was pretty slow. Slow enough that I wondered if the process was hung. The resulting data.frame is 4752 x 39, so not huge. I'm not sure what was slow, but it might be worth considering how to show progress. I believe httr has some tools for showing progress if it's slow because of the network, for example. But maybe network isn't the constraint. Granted, I made one request, so I'm not sure how widespread the issue is, but it was distracting.
It was not at all obvious to me that I would get 39 columns back. It would help if the documentation perhaps described the result, beyond saying "returns a data frame". And is it possible to request fewer columns? Some are clearly duplicate information (state_name, state_alpha).
It's odd to me that year is numeric but everything else in the data.frame is character, even when there are very obviously numeric columns in there. If I'm going to do anything useful with this crop yield data I just downloaded, the first thing I'm going to have to do is df$Value <- as.numeric(df$Value) so it seems worthwhile to return properly formatted data.
It was surprising that the one column name that the package code changes gets a rather unfriendly name ("CV (%)"), while the other columns follow a standard, well-behaved format.

Specific code notes

General style
- be consistent with single vs. double quotes
- don't need to quote/backtick list names (unless there's a space, which isn't the case anywhere here) e.g. prefer list(param=param) over list("param"=param)
- My one point of disagreement with Adam: it's ok to have nassqs_GET() capitalized: HTTP request methods are supposed to be capitalized (they're case sensitive, per the RFC). FWIW in httr, the request methods are capitalized while other acronyms (JSON etc.) are not.
- #' @importFrom httr function_name rather than httr::function_name
- Check out markdown formatting in roxygen2: will clean up your \code{\link{}}s
- if (something) { result } inline is bad form: either drop {} or make multiline. See https://style.tidyverse.org/syntax.html#indenting
data.R:
- can do nassqs_fields <- nassqs_params, don't need function () that()
- Give documentation for what those params mean? Or link to API docs?
nassqs.R
- Agree that release_questions() doesn't belong in the package
- nassqs_parse
- L127 docs not correct since it doesn't necessarily return a data frame
- "as" list is different from nassqs
- as "raw" is misleading; it's not raw in the R sense, it's "character". or as="text" if you want to be like httr
- L153: when is not a response object? Also better to use inherits() rather than class ==. Also can return() early rather than if/else
- L159 warning: can't be tested because if condition is met, L156 would have errored. Suggests better test coverage needed
- L177 read.csv
- L179: no else case?
- nassqs_auth
- IMO this function should just be function (key) Sys.setenv(NASSQS_TOKEN = key). What's the value of the interactive mode? It's reasonable to expect user to set auth at the beginning of the session.
- Not sure the dual purpose (set and get token) is worth it. In places where you want to read the token, just Sys.getenv()
- L214 don't need to end lines with ;
- nassqs (and mostly relevant for helpers.R functions too)
- Is this the function people should be calling? If so, the file would be more readable if you put this function at the top of the file
- A nicer interface would accept the "params" as ... rather than require you to wrap them in list() yourself. I don't think you need to pass ... to nassqs_GET (if any of its args are important to expose, add them as named arguments)
- Related, you can use @inheritParams to document the function args in only one place and pull them in in the other functions that have the same
- nassqs_GET
- L35 docs: make that an href
- I'd remove key from the function signature and just Sys.getenv(). L75-78 is unnecessary complexity
- I'd remove base_url from the signature (since it "should probably never be changed"). If you want it configurable for some reason, allow it to be set by an option and in the function get it like base_url <- getOption("rnasssqs.base_url", "https://quickstats.nass.usda.gov/api/")
- I'd remove url_only from the signature. It's confusing to have a function than can return very different things based on an argument. If that's functionality you need, I'd factor out the URL query assembly, ending in build_url(), to its own function, and have rnassqs_GET() call that and then httr::GET() the result. If you only have this argument for unit testing, I have an alternative proposal (see below).
- It's odd that the function allows "XML" format but rnassqs_parse errors if it gets XML.
- Consider removing "format" from the signature altogether since it can be set in params. Also, why would I care what content-type I'm querying in if my end result is going to be a data.frame?
- nassqs_check: what do I do if my request is too large? Give some recommendation for how to fix it. If it means that you need to make paginated requests, how would I do that? An even friendlier solution would be for you to handle the pagination for the user so they don't need to know/care about this API constraint.
helpers.R
- expand_list() is probably the only function there that fits with what I'd expect helpers to be (i.e. internal functions). I'd give it @keywords internal so it doesn't list in the help index, and possibly just document it with comments and not formal documentation. (And if you're going to document it, explain what it actually does, which is not obvious.)
Vignette
- .secret file is used in tests and supporting code, and it's discussed in the vignette, but it's not actually supported in the package (i.e. the nassqs_auth() doesn't look for a .secret file). And IIUC the vignette recommendation is incomplete: you'd still have to set or pass it in even if you read it in with readLines(). IMO I'd just drop that discussion and rely on the environment variable, or say that if you wanted to store it in some file (called whatever), you could read it and set it in your session like Sys.setenv(NASSQS_TOKEN = readLines(file)).
- Don't rnassqs:: your own functions, not necessary
- Looks like you should Suggest assertthat since your vignette uses it.
Tests
- oncran
- You may find the httptest package useful for these tests. Rather than using the url_only argument (which I suspect you only have for tests), you could use expect_GET(nassqs(...), url)
- httptest would also let you supply mock responses so you could test the full request/response/parse flow naturally. What you have with the the .rds testdata files works, of course, but httptest could make those tests more readable and maintainable.
- FYI there is testthat::skip_if(), and you could of course define your own skip_if_interactive() (but you take my suggestion to drop the interactive behavior of the auth function, then you can just delete these tests)
- withr::with_envvar can help manipulate environment variables within tests (withr is depended on by testthat, so it's a "free" dependency, in case that's a concern)
- L133-151: maybe you want one of these to assert that you can get the "text" content, but mostly you're just asserting that httr::content() works.
- local
- Again, I'd drop the .secret file (and thus the here dependency)
- L37: is.numeric(as.numeric(...)) is always true so this test doesn't do anything; also for future reference, testthat::expect_true may help in some places
- L56: httptest would let you test handling of error responses without a network connection

Miscellaneous

Should add stats and utils to Imports in DESCRIPTION

R CMD check shows this NOTE--maybe because the file is huge? Consider (re)moving the link, obfuscating it so that CRAN doesn't try to download it, or something.

Found the following (possibly) invalid URLs:
URL: ftp://ftp.nass.usda.gov/quickstats
From: README.md
Status: Error
Message: libcurl error code 6:
    Could not resolve host: ftp.nass.usda.gov

Would be good to add rnassqs.Rcheck and rnassqs_*.tar.gz to .Rbuildignore and .gitignore so that artifacts from R CMD build and check locally don't get checked in
.Rbuildignore includes a misspelling (paoer.pdf)
Re: spelling, check out the spelling package.

lmullen commented 5 years ago

@adamhsparks and @nealrichardson: Thanks to both of you for these thorough and detailed reviews.

@potterzot Please take a look at these comments from the reviewers. There is lots of good advice in here. It does seem like some of it will require some pretty fundamental reassessment of the package's user-facing interface. Do you think you can make revisions within two weeks, our typical deadline? i.e., by July 29?

potterzot commented 5 years ago

@lmullen, @adamhsparks, @nealrichardson first let me say a big thanks, these are some really good suggestions and you clearly put in a lot of time to give some great feedback. While some of the changes are substantial, I hope that the underlying framework will make them relatively easy to implement. I think I can probably submit a revision by July 29th. Is it possible to ask for an extension if that becomes necessary?

lmullen commented 5 years ago

@potterzot Sure. If you can plan on a July 29 deadline, that would be best, but an extension would be fine if it becomes necessary.

adamhsparks commented 5 years ago

@potterzot, I hope you find my review useful and not being negative. This is an extremely valuable package, my interest is in helping you improve it. Feel free to ask for help or guidance along the way. I'm happy to contribute.

potterzot commented 5 years ago

Hi @adamhsparks, thank you for your time and willingness to help. I've finished making the straight-foward changes you suggested, and am thinking about the larger issues, which seem to boil down to two related issues:

How much should we expose to the user
How can we simplify the interface to make it more usable

The main function of the package is nassqs(), which fetches and returns parsed data. But I exposed nassqs_GET() and nassqs_parse() because I wanted to make it possible for advanced users to deal with any edge cases that might come up. Perhaps that's not really a concern here, and it would make sense to hide both of those functions. Working with output from nassqs_GET() requires some knowledge of the httr package and how requests work, so it would only be useful to someone who wants to see the raw results and knows what to do with them. nassqs_parse() on the other hand doesn't do much more than what jsonlite::fromJSON does, so I think it could be hidden without really removing any control from the user. My goal was to make it easy to use, but also to allow a user to dig into the deeper mechanism if necessary. This may be born out of personal frustration from when I've not understood what a hidden function is doing and had to root around in the source code to figure it out.

I propose hiding nassqs_parse(), expanding the documentation for nassqs_GET() so that it clearly states that the main function is nassqs() and that in general that function should not be needed. What do you think about that and about the larger issue of usability?

adamhsparks commented 5 years ago

I think that sounds reasonable. The documentation can always be structured with the meat in the main vignette and then more advanced usage in another or farther down the page of a single vignette under an "Advanced Usage" header or some such.

potterzot commented 5 years ago

Question: Since both reviewers recommend removing release_questions(), where is it recommended that it go? These are helpful (for me) pre-release questions and the function is suggested by the R Packages book (here), but it doesn't suggest where to put it.

Response to Reviewers

The reviewers raise some excellent points to consider about usability and organization. The package feels greatly improved by virtue of their comments and suggestions. Thank you again for your time and your invaluable suggestions. rnassqs is much cleaner and much improved as a result.

Below I detail some more general thoughts that were raised by the reviewers and my response, and then detail specific response to each reviewer separately.

Data size and usability

One reviewer suggestion was to improve the interface to make it more user friendly, i.e. that the package seemed to be built around the needs of the API rather than the needs of the user. To some extent this is a function of the inflexibility of the API itself. A data request to the Quick Stats API returns JSON that, when parsed to a data.frame, results in a data.frame that has 39 columns. Unfortunately there is no way to limit the number of columns returned. The idea of being able to select a la dplyr is great, but since all 39 columns must be returned, it seems best to leave it to the user to select after the call is made.

Parameter passing

In response to suggestions about parameters and user experience I've made two changes. The first allows for specifying either a list of parameters in nassqs as was the previous case, or specifying each parameter as a separate argument to nassqs, as was suggested by @nealrichardson. In addition, I've added links to parameter documentation in the API, and now nassqs_params() returns a list of parameters, while nassqs_params("agg_level_desc") returns a description of the "agg_level_desc" parameter. I've also updated the vignette to show both methods.

Pagination and handling requests larger than 50,000 records

The question of pagination came up repeatedly. It's an interesting one in the context of this API. There is not a direct way to paginate results that the API supports. Typically I end up subsetting by year or geography to make the query small enough. rnassqs could try to subset by year or geography automatically, perhaps with a series of rules that first subset by year and if only one year is requested or the request is too large, then also subsetting by a smaller geography. However, there are potential issues here. For example, a state-level request will not necessarily result in the same data as collecting all counties for that state for two reasons:

Some data are collected at the state level only
County-level data may be suppressed where the state level data is not

Automatically subsetting by year would be doable though. I have added an issue for a future release to do so. In the meantime I have also added information in the error message to suggest how to subset the query. I have also included information on iteration to subset queries in the vignette.

Specific changes

Added reviewers to the DESCRIPTION with role = 'rev'
Fixed spelling errors (sorry)
Removed release_questions() (but see question at the top of this response)
Added a progress bar as suggested by @nealrichardson

Response to Reviewer 1

@adamhsparks provided some excellent feedback on issues of usability and potential unnecessary complexity. In particular, this comment was helpful.

I feel a bit like the functionality for the end-user is a bit overly complicated. This sort of package to me should just return data in a data.frame or list or some other R object that I can easily work with. In some cases I get a server response, in others I can ask for raw JSON or other file formats. I don't really see a good reason for this in R. I guess the functionality could be offered, but I think it needs to be hidden under a few layers for more advanced users that might want to use it. Rather it should just be a simple query for the data that I want and returning it in a format that I can readily use in R without extra steps. Just fetch the data and format it for me in one single step. I think it's possible with this package, but the documentation is a bit convoluted and not clear to me. There seem to be two major and related issues

While it's true that the package contains nassqs, which will just simply query and return a data.frame object without the user having to specify anything, I have reorganized and rewritten the documentation and vignette to emphasize nassqs rather than the low-level functions nassqs_GET and nassqs_parse. I have added documentation and changed the vignette to focus on the ease of use aspect, rather than on building a query using the core functions.

The second issue concerned the organization of functions. I have reorganized functions into files by collective functionality, in an effort to meet the guidelines suggested in R Packages, which states

While you’re free to arrange functions into files as you wish, the two extremes are bad: don’t put all functions into one file and don’t put each function into its own separate file.

Now functions dealing with the request and parsing of the request are in request.R. Authorization functions are in auth.R. Helpers are in helpers.R. Functions dealing with parameters and parameter values are in params.R. Functions that make queries easier are in wrappers.R. I think this strikes a good balance, but am certainly open to suggestions to make this clearer if needed.

Specific Items

General package

Clarified keeping the API key in the .Renviron file in the README and vignette
Added a CITATION file, though it has placeholders until/if the article is accepted by JOSS
Resolved GIT merge conflicts in paper.md
contributing and code of conduct text added in CONTRIBUTING.md and CONDUCT.md, as well as in the README

Vignette

Fixed references to nassqs_fields() in the vignette, which now refer to nassqs_param()
Fixed error with nassqs_field_values() in the vignette
Fixed inconsistencies with 'PARSE' in the vignette

Code

I did not change the case of nassqs_GET, based on style guidelines from the httr package: Best practices for API packages
removed base_url as a parameter in nassqs_GET

Code Style

Fixed inconsistencies with "=" and "<-"
All code lines are 80 characters or less except where required (e.g. documentation linking to long urls)

Documentation

Fixed incorrect reference to jsonlite::fromJSON
Fixed link to utils::read.table
Added "\code{\link[]{}}" for linking to function documentation (and later removed in favor of markdown syntax).
I did not change function titles to title case based on the examples from the 'R Packages' [documentation chapter]](http://r-pkgs.had.co.nz/man.html)
Added an example to nassqs_params()
Fixed example for nassqs_param_values()
Fixed inconsistencies between use of single and double quotes, though note that I use single quotes to refere to keys in a list or object, and double quotes to refer to strings.
wrapped all examples that make an API call in '\donttest'
Clarified documentation and examples in 'nassqs_GET' to specify that it is a low-level function
Clarified documentation in 'nassqs_parse' to better explain what it does and why

Response to Reviewer 2

@nealrichardson brought up several excellent points about the API and especially about testing and ease of use and focusing on the needs of the user. I feel it is easier to define a list of parameters and submit that as a single argument to nassqs, especially for example when iterating over a collection of queries. However, I recognize both needs, and have made it possible to call nassqs in either of two ways:

# First method, a named list of parameters
params <- list(agg_level_desc = "STATE",
               state_alpha = c("VA", "WA"))
nassqs(params)

# Second method, separate arguments
nassqs(agg_level_desc = "STATE", state_alpha = c("VA", "WA"))

# Or without capitalizing
nassqs(agg_level_desc = "state", state_alpha = c("va", "wa"))

I have expanded the vignette to demonstrate both methods and to emphasize the iteration and pagination of data available by iterating over a list of parameter lists.

Many of @nealrichardson's suggestions involve simplifying the interface, and I think the new function calls are much improved in this regard. These suggests were a real gem. Authorization is simpler, functions have fewer and simpler arguments, and overall ease of use is improved. His suggestion of allowing year >= 2012 instead of (or in addition to) year__GE = 2012 is also a good one. I have not implemented it here because I suspect LIKE and NOT LIKE would be slightly more difficult. I have created an issue to implement this in a future release.

Another concern was that all data is in character format rather than numeric for columns that are numeric. The reason is that the Quick Stats data lists suppressed or unavailable information in a variety of character-based ways. As a result the Value field may contain "(D)", "(Z)", or "(S)" rather than numbers. Converting to numeric makes these values NA, which loses the specific information about why the data is missing. It is true that it is easy enough to convert to numerical format, but in my opinion keeping this information about why data is missing is important.

The httptest package is a huge help and I wish I had known about it when I was asking on twitter about API testing months ago. I've reorganized and updated the tests to use mock API calls, and also with tests in test files that correspond to the function file names in the R directory. For example, test-requests.R contains tests for functions in R/request.R.

Specific Items

General

Added 'assertthat' to Suggests in DESCRIPTION
Added stats and utils to Imports in DESCRIPTION
Added rnassqs.Rcheck and rnassqs_*.tar.gz to .Rbuildignore and .gitignore
Added case insensitivity
Converted documentation to markdown format
Improved documentation generally, especially for parameter definitions
Added a package web site with pkgdown

Code

Moved nassqs to top of file since it is the main function
Fixed nassqs_fields <- nassqs_params
In nassqs_GET, removed url_only and format as function parameters
In nassqs_parse, used inherits rather than class() ==.
In nassqs_parse RE: "L153: when is not a response object?", at times the API is not working, so in that case this returns the error message directly.
In nassqs_parse, removed unreachable warning
In nassqs_parse, added else case and changed read.table to read.csv
In nassqs and nassqs_parse, change 'raw' to 'text' as an option for the as parameter
In nassqs_check, RE: "what do I do if my request is too large? Give some recommendation for how to fix it. If it means that you need to make paginated requests, how would I do that? An even friendlier solution would be for you to handle the pagination for the user so they don't need to know/care about this API constraint."
In expand_list, added @keywords internal and expanded with a description of what it does and why.
In nassqs_auth(), now nassqs_GET checks for the environmental variable NASSQS_TOKEN, and nassqs_auth simply sets that token.
In `nassqs_parse(), RE: "It was surprising that the one column name that the package code changes gets a rather unfriendly name ("CV (%)"), while the other columns follow a standard, well-behaved format." This name change occurs because of the CSV response, so it is changed there to match the column name in other response types (i.e. JSON and XML)

Code Style

Fixed inline if statements: if() { } by removing braces
Regarding this comment: '@importFrom httr function_name rather than httr::function_name', [R Packages]() generally recommends using the "::" version since it makes clear what package the function is coming from, with the exception that it makes things very slightly slower. However, the vast amount of time in the GET request is due to the API service, so in this case using "::" makes sense to me
Removed quotes from list names

Vignette

Removed rnassqs:: from functions
Removed .secret and references to it.

Tests

Removed use of .secret for testing
Reorganized tests to match the file names of the functions they are testing
Make use of httptest::with_mock_api
Fixed is.numeric(as.numeric(...))

adamhsparks commented 5 years ago

This looks much improved as I've glanced over it. Thanks for thoughtfully responding to our reviews and comments. Thanks for explaining any reasons why my suggestions weren't followed, I have no objections to any of them. Some of my suggestions have been based on CRAN's, umm, erratic(?) enforcement of rules from time-to-time, so ignoring some of my suggestions are probably fine as I'm not sure that I use title case in all my documentation everywhere but think I was pulled up on it once before.

I think that the organisation of the functions is much more clear now and agree with Hadley on not all in one and not one only per .R file.

Regarding the question on release_questions(), devtools::release() actually asks most of those and more when you use it, I'd suggest using that rather than including questions for yourself in the package NAMESPACE.

For expand_list() I'd use a @noRd tag since it does not need to be exposed to the end-user, that I can tell? Documentation is good, I do that for my internal functions so I know what they do, but it shouldn't clutter the user's experience having it documented unless it's used somewhere that I'm missing where an end-user actually calls it?

There's no need for the CITATION file to be incomplete as you've suggested. It should have two entries after acceptance to JOSS. One should just be for the package, the current version number and year it was released that will automatically update with new releases, which you can set up now. The second is the JOSS paper citation that will never change. The example I provided shows this.

I'm curious, how is it different than usdarnass, which you have mentioned in the README now? This isn't detailed in the original submission.

potterzot commented 5 years ago

@adamhsparks thank you. I've updated the CITATION file and also added @noRd to expand_list().

Regarding usdarnass, I added the reference to the README after I found out about it, which was after I had submitted for rOpenSci review. I'm fairly sure rnassqs was developed first, since my first git commit was in June 2015, while theirs was November 2018, and rnassqs was published on CRAN on May 03 while usdarnass was published on CRAN on June 21. I think they were actually developed unaware of each other. If you have any thoughts or suggestions about a course of action I'd be all ears. It seems we could basically continue to develop in parallel or we could merge packages. I haven't reached out to the authors other than make a suggestion on an issue to let them know they could allow for multiple options as I write below.

The differences are small as far as I can tell:

rnassqs makes all of the query parameters available, while usdarnass only allows a subset.
rnassqs allows multiple options like state_alpha = c("VA", "WA"), but after I commented on an issue on usdarnass to say it was possible to do that, usdarnass does that as well.
Before this review, rnassqs took a list of parameters like nassqs(params), while usdarnass takes parameters like nass_data(state_alpha = "VA", agg_level_desc = "state"). Now as per suggestion from @nealrichardson rnassqs works either way.

nealrichardson commented 5 years ago

I'll just briefly comment that this all sounds good in principle and I look forward to re-reviewing in detail, though I won't be able to get to that until early next week.

adamhsparks commented 5 years ago

Thanks, @potterzot. As I said, it was a quick glance. Echoing what @nealrichardson said, I need to fully re-review everything. Those were just the few things I found quickly so I commented.

lmullen commented 5 years ago

Thanks for the detailed changes and response, @potterzot.

@adamhsparks and @nealrichardson Thanks for looking over the changes. Could you please complete your re-review and either request additional changes or vote to approve the package by August 8?

adamhsparks commented 5 years ago

Will do!

Phone: +61746311948 Mobile: +61415489422 Mobile: +61415489422 On 2 Aug 2019, 5:38 AM +1000, Lincoln Mullen notifications@github.com, wrote:

Thanks for the detailed changes and response, @potterzot. @adamhsparks and @nealrichardson Thanks for looking over the changes. Could you please complete your re-review and either request additional changes or vote to approve the package by August 8? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

nealrichardson commented 5 years ago

Nice work. This looks much improved. I found a few sylistic issues again, and I had some suggestions for how to improve the testing, but rather than write them here, I've made a pull request with them for you to review/merge. The other reason I implemented these suggestions myself was that I got a test failure locally because one of the tests required auth but did not have the appropriate skip_if_no_auth() check, so I was already in the code to debug that. There was also a R CMD check issue I encountered because the .Rbuildignore still wasn't correctly excluding previously built tarballs. All fixed by that PR.

Test coverage could be better, though my PR bumps it up to 91%. Happy to advise on covering the conditions that currently are missed that if you want, though I won't withhold approval based on not reaching 100% line coverate.

One last followup point:

In nassqs_parse RE: "L153: when is not a response object?", at times the API is not working, so in that case this returns the error message directly.

httr::GET() will either return a response object (potentially with an error status, which you will have already have handled before getting here because you pass through the nassqs_check() function) or GET() will itself error (like if your internet is down). I don't think it's possible for it to return anything different.

adamhsparks commented 5 years ago

I have a few last minor points (nitpicks?) that if changed will improve the package. Overall it's greatly improved and I like how it works. Congrats!

1.* @lmullen already noted this much earlier in the process. Please remove the "Date" field from the DESCRIPTION file. CRAN will automatically assign this and it's prone to ending up being out of date if you rely upon updating it manually.

You don't need a paste() in a stop(), e.g., stop(paste0("Your query parameters include 'format' as ", format, " but it should be one of 'json', 'xml', or 'csv'.")). It should be written as: stop("Your query parameters include 'format' as ", format, " but it should be one of 'json', 'xml', or 'csv'.") There are several instances of this that I noted.

3.* "inst/examples/example_parameters.R" has an incomplete final line. Add a line return to the file at the end of the file to fix this.

The documentation for nasqss_GET() is inconsistent in how it references functions in the description. Some of the other R functions discussed in that paragraph use the function() convention while referring to nasqss_GET() is only as nassqss_GET minus the (). As a user I find it more clear if the () is used in documentation to indicate a function, not a parameter is being discussed. Note I didn't check all documentation, I just noticed this here.

5.* The documentation example for nassqss_param_values has "YIElD" not "YIELD" in the comment section, is this correct?

6.* nassqss_parse() documentation Description field is missing a "'" prior to (Z).

7.* I'm not sure that here needs to be listed in the DESCRIPTION Suggests field. It's only used in data-raw as far as I can tell? If so, that folder is not included in the R package so shouldn't need to be specified here.

8.* In the data-raw/get_test_data.R file, it might be good to set the version to 2 for maximum compatibility in the near term with versions of R from 1.4.0 to current. If it's NULL it will default to version 3.

The README code could be formatted a bit more nicely too using proper RMarkdown chunks, e.g.,

```{r eval=FALSE}
# Via devtools
library(devtools)
install_github('potterzot/rnassqs')

# Via CRAN
install.packages("rnassqs")```

The README may not need to be a .Rmd? I can't see that you have any executed R code so you could simplify and just use a .md file.
Consider using codemetar::write_codemeta() to create and update a .json metadata file for the package?

Once these are addressed (at your discretion for many of them) I'm happy to recommend accepting. I've added a "*" after the number and before the comment for the items that I think must be fixed. Those without are at your discretion.

potterzot commented 5 years ago

@adamhsparks Thank you for the incredible detail in this! Much appreciated. I've made all of the changes you suggest except for this one, which I'm unclear on:

8.* In the data-raw/get_test_data.R file, it might be good to set the version to 2 for maximum compatibility in the near term with versions of R from 1.4.0 to current. If it's NULL it will default to version 3.

What do you mean by setting the version? Do you mean setting the R version in DESCRIPTION?

potterzot commented 5 years ago

@nealrichardson I've reviewed and merged your PR, thanks! There were two tests for error handling that were failing:

"Too-large request error is handled"
"Other server error is handled"

Because they were within the with_mock_api() block they were returning a GET object instead of the error. I moved these to the authorization block and they work.

Regarding .Rbuildignore excluding tarballs, I had included that at some point long ago, but removed it for a reason that I don't remember. Thank you for adding it.

potterzot commented 5 years ago

@nealrichardson PS if I have your okay I've also added you as a contributor in DESCRIPTION.

nealrichardson commented 5 years ago

Where did you see the failure? The PR merge commit passed on Travis. They don't require auth because they use the mock responses I added here: https://github.com/potterzot/rnassqs/pull/15/files#diff-7c5a672790a8227968bfd57c3a71faa0 Did you possibly make other changes that altered the querystring in the request? That would change the request URL and thus change the mock file path it was looking for. If so, you can rename those mock files to match and they'll be fine.

Sure, happy to be listed at contributor.

potterzot commented 5 years ago

@nealrichardson Hmm, I checked out your commit again and am having no trouble. I must have changed something that started giving me those errors. I returned those files to their original state in a new commit. Also removed the response check based on your note about nassqs_GET() always returning a response object.

adamhsparks commented 5 years ago

Hi @potterzot, The RDS version of the file is 1, 2 or (default) 3. This changed with R 3.6 to "3". So many users may not have the ability to read a version 3 RDS file yet if they've not upgraded to R >= 3.6.

See ?saveRDS for more on version

potterzot commented 5 years ago

Ah I see, that script was generally outdated anyway, so thank you for pointing that out! I was also unaware of the breaking change and do a lot of my data storage in RDS so thank you.

adamhsparks commented 5 years ago

@lmullen, I've updated my initial review with the suggestion to accept, ticked the rest of the boxes and updated time spent reviewing.

lmullen commented 5 years ago

@adamhsparks Great, thanks so much.

@nealrichardson Is there anything else outstanding from your perspective?

nealrichardson commented 5 years ago

All good, just checked the boxes.

lmullen commented 5 years ago

Approved! Thanks @potterzot for submitting and making all the requested changes. And thanks, @adamhsparks and @nealrichardson for especially thorough reviews. Much appreciated.

@potterzot here are some to-dos to complete the onboarding process.

[ ] Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do. Once you do I'll give you admin access.
[ ] Add the rOpenSci footer to the bottom of your README " [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)"
[ ] Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).
[ ] We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Since you are also publishing to JOSS, we need you to do the following.

[ ] Activate Zenodo watching the repo
[ ] Tag and create a release so as to create a Zenodo version and DOI
[ ] Submit to JOSS using the Zenodo DOI. http://joss.theoj.org/papers/new When the paper shows up at JOSS, then add the following comment to the submission thread. This submission has been accepted to rOpenSci. The review thread can be found at <URL TO THIS REVIEW>.

You can also release a new version to CRAN.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a blog post about your package - either a short introduction to it with one example or a longer post with some narrative about its development or something you learned, and an example of its use. If you are interested, review the instructions, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

potterzot commented 5 years ago

Thank you! This is very exciting. Thank you all for your fantastic help and efforts. @adamhsparks, I am realizing I didn't specifically ask your permission to include you as a reviewer. @lmullen, I've switched the repository over to ropensci and made changes to the readme.

lmullen commented 5 years ago

Ok, you should be an admin on the repository again, @potterzot.

potterzot commented 5 years ago

@stefaniebutland I would be happy to do a blog post, probably realistically not possible until October. I think I could do a longer article discussing why I started developing the package and how it's been helpful in my research and what I've learned in the process, if that seems like a good fit. Happy to do a shorter one as well.

stefaniebutland commented 5 years ago

@potterzot Sounds good! Please submit a draft when you're ready and we can select a publication date at that time.

why I started developing the package and how it's been helpful in my research

If you include a "cool" example (that's not shown elsewhere) this is especially valuable as a way for readers to see how they might use the package.

what I've learned in the process

Always good to share this. Try to choose a couple of key points.

Thanks!

stefaniebutland commented 5 years ago

Hi @potterzot. I'm checking in to let you know I have a blog post slot open for Tues Oct 29 if you still wanted to do a long form post. A shorter tech note is quite appropriate and could be published any time, after my review.

potterzot commented 5 years ago

@stefaniebutland I couldn't make the Oct 29 deadline but have a working draft now, do you have a good future date that would work? The draft is basically done but I can change the template date and file names to match the anticipated date.

Also, I'm not sure where to put images. The template links I can change, but I don't see the corresponding img/blog-images directory in the roweb2 repository.

potterzot commented 5 years ago

@stefaniebutland nevermind on the second part, I figured out the images. It helps if I read the instructions in full!

stefaniebutland commented 5 years ago

For now, please date 2019-11-26. That might change based on submission status of other posts that already have dates assigned.

I admit there are a LOT of instructions ;-)

ropensci / software-review

Submission: rnassqs - access the USDA-NASS 'Quick Stats' data through their API #298

Scope

297, responded to by @noamross

Technical checks

Publication options

Code of conduct

Editor checks:

Editor comments

Package Review

Documentation

For packages co-submitting to JOSS

Functionality

Final approval (post-review)

Review Comments

Comments on Vignette

Comments on functions

Comments on documentation

Comments on code style

Package Review

Documentation

Functionality

Final approval (post-review)

Review Comments

Observations from using the package

Specific code notes

Miscellaneous

Response to Reviewers

Data size and usability

Parameter passing

Pagination and handling requests larger than 50,000 records

Specific changes

Response to Reviewer 1

Specific Items

General package

Vignette

Code

Code Style

Documentation

Response to Reviewer 2

Specific Items

General

Code

Code Style

Vignette

Tests