Closed potterzot closed 5 years ago
@potterzot I will be the editor for this peer review.
Here are my editorial checks.
In running R CMD check I get the following:
❯ checking top-level files ... NOTE
Non-standard files/directories found at top level:
‘paper.bib’ ‘paper.md’
[ ] Please either add those files too .Rbuildignore
or put them in inst/
.
[ ] In paper.md
there are a few typos, including these. You might look over the paper one more time as you correct these.
"be one of the most difficult stages of research make reproducible" should be "to make"
"explicitly constructing html GET requests" should be "HTTP GET requests." (HTTP is the protocol that often, but not usually in the case of an API like this one, returns HTML.)
[ ] Could you clarify, please, what is tested when the user has not provided an API key? For tests that you are skipping if no API key is available, is it possible to stub or mock the API? The testthat package includes some mocking functions, as well as testthat::expect_equal_to_reference()
which can be used for that purpose.
[ ] After running spelling::spell_check_package()
I note a number of misspellings. Please run that on the package yourself and correct the actual misspellings.
[ ] After running goodpractice::gp()
there is this change needed to the DESCRIPTION
:
✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite
often. A build date will be added to the package when you perform `R CMD build` on
it.
I am in the process of looking for peer reviewers.
Reviewers: Due date:
@lmullen thank you for your comments. I've made a commit to address your comments.
Regarding this note:
Could you clarify, please, what is tested when the user has not provided an API key? For tests that you are skipping if no API key is available, is it possible to stub or mock the API? The testthat package includes some mocking functions, as well as testthat::expect_equal_to_reference() which can be used for that purpose.
Tests in tests/testthat/test-oncran.R beginning on line 55 include mock API calls. The tests make the request, specifying that the function return the GET request URL rather than actually make the request, and that request is compared to the correct URL. There are three API paths to test:
test_that("nass_GET forms a correct URL", ...)
test_that("nassqs_param_values forms a correct URL", ...)
test_that("nassqs_record_county forms a correct URL", ...)
There are additional mock API tests that follow those, but those are for convenience functions for making specific requests, e.g. nassqs_area
and nassqs_yield
, which wrap nassqs_GET
.
Tests in tests/testthat/test-local.R
make actual API calls using an API key, and are not possible on CRAN.
Is there a better way of organizing tests that makes it clear where the API mock tests are done and where the actual API call tests are done?
@potterzot That sounds fine to me. I just wanted to make sure the reviewers and I understood.
@potterzot Apologies for the delay in getting this review going. One person has agreed to review but a string of others have been unavailable at the start of the summer. Still looking for that second reviewer and then the review will begin.
Thanks to our reviewers for agreeing to take on this package.
Reviewer: @adamhsparks Reviewer: @nealrichardson Due date: 2019-07-11
You can find the guide for reviewers here. Please let me know if you have any questions.
I know I'm behind. Sorry, I've had a rather busy time lately. I'm starting the review today and will see how I go.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be auto-generated via Authors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 7
This package offers great functionality and defines very nicely why it's necessary. I'm happy to see this sort of package being written.
Following are my comments.
Spell check the package, in particular the DESCRIPTION file's Line 12, Agricultre
It's a bit of opsec, but I think the README and vignette should be more explicit about keeping the API keys secret and not embedding them in scripts. Along with this, highlighting why it's useful to use the .Renviron
file for this purpose would be nice (you don't have to worry about adding it to .gitignore for one). For new users just how to modify the .Renviron is likely to be confusing at best and frustrating at worst. Perhaps it would be nice to include an example that shows how to use usethis::edit_r_environ()
for this purpose to streamline the illustrations and examples.
The DESCRIPTION file's description text might need " ' " around "API". I had to do this for the nasapower package to have it accepted on CRAN.
I always encourage including a proper CITATION file in /inst, see https://github.com/ropensci/nasapower/blob/master/inst/CITATION for an example
The paper.md file has Git conflicts in it that need to be resolved
There is no statement about contributions or a code-of-conduct present
I feel a bit like the functionality for the end-user is a bit overly complicated. This sort of package to me should just return data in a data.frame or list or some other R object that I can easily work with. In some cases I get a server response, in others I can ask for raw JSON or other file formats. I don't really see a good reason for this in R. I guess the functionality could be offered, but I think it needs to be hidden under a few layers for more advanced users that might want to use it. Rather it should just be a simple query for the data that I want and returning it in a format that I can readily use in R without extra steps. Just fetch the data and format it for me in one single step. I think it's possible with this package, but the documentation is a bit convoluted and not clear to me.
When I try using the nassqs_token()
I receive and error message.
> library(rnassqs)
> nassqs_token()
Error in nassqs_token() : could not find function "nassqs_token"
This example works as expected,
nassqs_fields()
[1] "agg_level_desc" "asd_code"
[3] "asd_desc" "begin_code"
[5] "class_desc" "commodity_desc"
[7] "congr_district_code" "country_code"
[9] "country_name" "county_ansi"
[11] "county_code" "county_name"
[13] "CV" "domaincat_desc"
[15] "domain_desc" "end_code"
[17] "freq_desc" "group_desc"
[19] "load_time" "location_desc"
[21] "prodn_practice_desc" "reference_period_desc"
[23] "region_desc" "sector_desc"
[25] "short_desc" "state_alpha"
[27] "state_ansi" "state_name"
[29] "state_fips_code" "statisticcat_desc"
[31] "source_desc" "unit_desc"
[33] "util_practice_desc" "Value"
[35] "watershed_code" "watershed_desc"
[37] "week_ending" "year"
[39] "zip_5"
however,
?nassqs_fields()
returns a help file that says,
Deprecated: Return list of NASS QS parameters.
Description
Deprecated. Use nassqs_params() instead.
Suggest updating vignette to match most recent functionality.
Another error occurs with this example from the vignette.
> rnassqs::nassqs_field_values(field = 'unit_desc')
Error: 'nassqs_field_values' is not an exported object from 'namespace:rnassqs'
The "All together" section script is not functional.
fields <- nassq_fields()
Error in nassq_fields() : could not find function "nassq_fields"
Function names are not consistent with GET being all caps but parse and check not. In the vignette text nassqs_parse()
is referred to as PARSE in all caps. Consistency in function naming will help the end user.
It is entirely up to the package authors how to organise the functions, but I find the current structure confusing at best. Typically when I see a file with just the package name, it is just there to provide the help file for the package with author information, references and other basic info. The nassqs.R
file has several functions in it for the package along with release_questions()
, which I find to be extremely odd. This function is not something that should be in the package and exposed to end-users.
My suggestion is remove release_questions()
entirely and split out the functions into their own files with the function name being the file-name. This helps make it easier to keep the functions organised and updated. Splitting the functions out is entirely up to the authors if they wish to implement this structure, however I feel that removing release_questions()
is necessary.
base_URL
given as a parameter that the user can modify? The documentation even says it "probably" should not be changed. I would just hard-code it and not give the user any possibility of changing it. I can't see any good reason for doing this. If the URL changes, then the package should be updated to reflect the changes.When packages are mentioned in the documentation, wrap them in \pkg{httr} for proper formatting to indicate that you are referring to a package. Likewise, R can be written as \R for special formatting.
The text \code{jsonlite::fromJSON} should be written as \code{\link[jsonlite]{fromJSON}} so that it links to the help file for this function.
Titles for function help files should be written in title case.
All exported functions should have examples in the documentation, nassqs_check()
does not have any examples.
The example for nassqs_param_values()
is commented out making it difficult to follow. Examples should not be commented out and should be clear an easy to follow.
# See all values available for the statisticcat_desc field. Values may not
# be available in the context of other parameters you set, for example
# a given state may not have any 'YIElD' in blueberries if they don't grow
# blueberries in that state.
# Requires an API key:
#nassqs_param_values("statisticcat_desc", key = "my api key")
Should appear as
# See all values available for the statisticcat_desc field. Values may not
# be available in the context of other parameters you set, for example
# a given state may not have any 'YIElD' in blueberries if they don't grow
# blueberries in that state.
# Requires an API key:
nassqs_param_values("statisticcat_desc", key = "my api key")
nassqs_GET()
confusing. I don't understand the first example.> params = list(commodity_name="CORN",
+ year=2012,
+ agg_level_desc = "STATE",
+ state_alpha = "WA",
+ statisticcat_desc = "YIELD")
> nassqs_GET(params)
Response [https://quickstats.nass.usda.gov/api/api_GET?key=XXXXXXXXXXXXXXXXXXXXXXX&commodity_name=CORN&year=2012&agg_level_desc=STATE&state_alpha=WA&statisticcat_desc=YIELD&format=JSON]
Date: 2019-07-13 04:48
Status: 200
Content-Type: application/json
Size: 148 kB
What do I do with this response value? How is this response yields for corn in 2012 in Washington?
Does the end-user even need to interface with this function or should it be hidden and used by the other functions in the package that return data in data.frames
or other R objects?
nassqs_params()
lacks examples
The example for nassqs_parse()
could be easier to follow.
# Set parameters and make the request
params = list(commodity_name="CORN",
year=2012,
agg_level_desc = "STATE",
state_alpha = "WA",
statisticcat_desc = "YIELD")
req <- nassqs_GET(params)
nassqs_parse(req, as = "data.frame")
would be more clear as
# Set parameters and make the request
params <- list(
commodity_name = "CORN",
year = 2012,
agg_level_desc = "STATE",
state_alpha = "WA",
statisticcat_desc = "YIELD"
)
req <- nassqs_GET(params)
corn <- nassqs_parse(req, as = "data.frame")
head(corn)
nassqs_parse()
as
is unclear. It states it indicates the data type returned, but doesn't list the data types aside from in the usage
section, which indicates that a list
is possible, but this doesn't appear to be documented? The @return
section says a data frame or raw text of the content from the request.
Any functions that query an external server for data and may fail or take an extended period of time to run should have the examples wrapped in a \donttest{} for CRAN but still allowing for local testing. I see most examples are wrapped but not all that run an external query.
In most cases the code is clearly written, in some cases the style is inconsistent with lack of spaces around a =
, e.g. Line 73 of nassqs.R. This also applies in the examples for documentation. Also, single and double quotes are used interchangeably. I find it easier to follow if only one style is used in all cases as there are cases where single quotes only may be used and so forth and so on.
The operator used to assign in the examples also switches between =
and <-
. Only one should be used consistently.
Wrap code at 80 characters for ease of reading and for those of us that don't have editor windows that expand beyond 80 chars wide.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 5
Looks like I get to be the infamous "Reviewer 2" this time :)
Since Adam dutifully got his review in on time and I'm delayed a couple of days, I'll say at the start that I agree generally with all of his points (with one minor exception, noted below), and in the interest of brevity, I've tried to avoid reiterating comments he made.
I greatly appreciate packages like this and the effort that goes into making them. This is the kind of package I wish existed when I was doing my dissertation--would have made at least the data retrieval and management part of research a lot cleaner. So, thank you for your contribution.
Style and code-hygiene comments aside, my main suggestion to you is to think about how you can orient the package towards the R user and their needs. The purpose of this package should be to encapsulate your hard-earned knowledge of how this API works and enable data scientists to access that data as naturally as possible. However, the package currently seems to expose an R interface centered around the needs of the API and not the R user. For example, the functions that seem to make up the public interface for the package are mostly stuck in "helpers.R", as if they're ancillary when really they should be front and center.
Here's a more concrete example: reading the vignette discussion about the comparison operators, it looks like the way to get data on corn production in Virginia since 2012 is
nassqs(list(commodity_desc="CORN", year__GE=2012, state_alpha="VA")))
But when I think about the R code that I want to type to get that data, it looks more like
nassqs(commodity == "corn", state == "va", year >= 2012)
That is, not case-sensitive, handles the different ways commodities and states (for example) could be referenced (integer code, abbreviation, long name), and more naturally handles the comparison operators (can translate year >= 2012
to year__GE=2012
). You could go even farther and have a dplyr
syntax where you can filter()
rows and select()
columns, so that the R user is essentially describing the data.frame they want to get, and your package figures out how to turn that into API requests, handle pagination, etc., and return the shape of data that an R user expects.
I'm not saying you need to do exactly this/all of this now (though some things, like case insensitvity, would be easy enough), but more to suggest ideas for the future and to show what I mean in terms of thinking of the R interface and not just what the API demands. When I wrap APIs (and in general write packages), I like to start by thinking about the R code I want (users) to type, and that usually means doing more work in the package to mask the awkwardness of the underlying HTTP API.
Another way to get more R-user-centric would be to improve the documentation, which I found to be thin, often tautological (@param api_path the API path
), and not helpful for an inexperienced user. Documentation is a bit of work but it's worth doing, especially for how it can help focus you on what the intended user needs to know in order to get value from the package--and how you can make what details they need to know as minimal as possible.
A good way to see what the docs will look like to your users is to use pkgdown
to build the package website. You don't need to publish this (though it's nice if you do), but I've found that even doing it locally makes it really visible to me where my documentation is not the most helpful or beautiful.
I got a token and did the example on the readme (corn yields in Virginia and Pennsylvania for since 2000).
> library(rnassqs)
> Sys.setenv(NASSQS_TOKEN="REDACTED")
> params <- list("commodity_desc"="CORN",
+ "year__GE"=2000,
+ "state_alpha"=c("VA", "PA"),
+ "statisticcat_desc"="YIELD")
> df <- nassqs(params)
That all worked as expected. That said,
4752 x 39
, so not huge. I'm not sure what was slow, but it might be worth considering how to show progress. I believe httr
has some tools for showing progress if it's slow because of the network, for example. But maybe network isn't the constraint. Granted, I made one request, so I'm not sure how widespread the issue is, but it was distracting.year
is numeric but everything else in the data.frame is character, even when there are very obviously numeric columns in there. If I'm going to do anything useful with this crop yield data I just downloaded, the first thing I'm going to have to do is df$Value <- as.numeric(df$Value)
so it seems worthwhile to return properly formatted data."CV (%)"
), while the other columns follow a standard, well-behaved format.list(param=param)
over list("param"=param)
nassqs_GET()
capitalized: HTTP request methods are supposed to be capitalized (they're case sensitive, per the RFC). FWIW in httr
, the request methods are capitalized while other acronyms (JSON etc.) are not.#' @importFrom httr function_name
rather than httr::function_name
\code{\link{}}
sif (something) { result }
inline is bad form: either drop {}
or make multiline. See https://style.tidyverse.org/syntax.html#indentingnassqs_fields <- nassqs_params
, don't need function () that()
release_questions()
doesn't belong in the packagenassqs
raw
in the R sense, it's "character". or as="text" if you want to be like httrinherits()
rather than class ==. Also can return() early rather than if/elsefunction (key) Sys.setenv(NASSQS_TOKEN = key)
. What's the value of the interactive mode? It's reasonable to expect user to set auth at the beginning of the session.@inheritParams
to document the function args in only one place and pull them in in the other functions that have the samekey
from the function signature and just Sys.getenv(). L75-78 is unnecessary complexitybase_url
from the signature (since it "should probably never be changed"). If you want it configurable for some reason, allow it to be set by an option and in the function get it like base_url <- getOption("rnasssqs.base_url", "https://quickstats.nass.usda.gov/api/")
url_only
from the signature. It's confusing to have a function than can return very different things based on an argument. If that's functionality you need, I'd factor out the URL query assembly, ending in build_url()
, to its own function, and have rnassqs_GET()
call that and then httr::GET()
the result. If you only have this argument for unit testing, I have an alternative proposal (see below).rnassqs_parse
errors if it gets XML.expand_list()
is probably the only function there that fits with what I'd expect helpers to be (i.e. internal functions). I'd give it @keywords internal
so it doesn't list in the help index, and possibly just document it with comments and not formal documentation. (And if you're going to document it, explain what it actually does, which is not obvious.).secret
file is used in tests and supporting code, and it's discussed in the vignette, but it's not actually supported in the package (i.e. the nassqs_auth() doesn't look for a .secret file). And IIUC the vignette recommendation is incomplete: you'd still have to set or pass it in even if you read it in with readLines(). IMO I'd just drop that discussion and rely on the environment variable, or say that if you wanted to store it in some file (called whatever), you could read it and set it in your session like Sys.setenv(NASSQS_TOKEN = readLines(file))
.rnassqs::
your own functions, not necessarySuggest
assertthat
since your vignette uses it.httptest
package useful for these tests. Rather than using the url_only
argument (which I suspect you only have for tests), you could use expect_GET(nassqs(...), url)
httptest
would also let you supply mock responses so you could test the full request/response/parse flow naturally. What you have with the the .rds testdata files works, of course, but httptest
could make those tests more readable and maintainable.testthat::skip_if()
, and you could of course define your own skip_if_interactive()
(but you take my suggestion to drop the interactive behavior of the auth function, then you can just delete these tests)withr::with_envvar
can help manipulate environment variables within tests (withr
is depended on by testthat, so it's a "free" dependency, in case that's a concern)httr::content()
works.here
dependency)is.numeric(as.numeric(...))
is always true so this test doesn't do anything; also for future reference, testthat::expect_true
may help in some placeshttptest
would let you test handling of error responses without a network connectionstats
and utils
to Imports
in DESCRIPTIONR CMD check
shows this NOTE--maybe because the file is huge? Consider (re)moving the link, obfuscating it so that CRAN doesn't try to download it, or something.
Found the following (possibly) invalid URLs:
URL: ftp://ftp.nass.usda.gov/quickstats
From: README.md
Status: Error
Message: libcurl error code 6:
Could not resolve host: ftp.nass.usda.gov
R CMD build
and check
locally don't get checked inspelling
package.@adamhsparks and @nealrichardson: Thanks to both of you for these thorough and detailed reviews.
@potterzot Please take a look at these comments from the reviewers. There is lots of good advice in here. It does seem like some of it will require some pretty fundamental reassessment of the package's user-facing interface. Do you think you can make revisions within two weeks, our typical deadline? i.e., by July 29?
@lmullen, @adamhsparks, @nealrichardson first let me say a big thanks, these are some really good suggestions and you clearly put in a lot of time to give some great feedback. While some of the changes are substantial, I hope that the underlying framework will make them relatively easy to implement. I think I can probably submit a revision by July 29th. Is it possible to ask for an extension if that becomes necessary?
@potterzot Sure. If you can plan on a July 29 deadline, that would be best, but an extension would be fine if it becomes necessary.
@potterzot, I hope you find my review useful and not being negative. This is an extremely valuable package, my interest is in helping you improve it. Feel free to ask for help or guidance along the way. I'm happy to contribute.
Hi @adamhsparks, thank you for your time and willingness to help. I've finished making the straight-foward changes you suggested, and am thinking about the larger issues, which seem to boil down to two related issues:
The main function of the package is nassqs()
, which fetches and returns parsed data. But I exposed nassqs_GET()
and nassqs_parse()
because I wanted to make it possible for advanced users to deal with any edge cases that might come up. Perhaps that's not really a concern here, and it would make sense to hide both of those functions. Working with output from nassqs_GET()
requires some knowledge of the httr
package and how requests work, so it would only be useful to someone who wants to see the raw results and knows what to do with them. nassqs_parse()
on the other hand doesn't do much more than what jsonlite::fromJSON
does, so I think it could be hidden without really removing any control from the user. My goal was to make it easy to use, but also to allow a user to dig into the deeper mechanism if necessary. This may be born out of personal frustration from when I've not understood what a hidden function is doing and had to root around in the source code to figure it out.
I propose hiding nassqs_parse()
, expanding the documentation for nassqs_GET()
so that it clearly states that the main function is nassqs()
and that in general that function should not be needed. What do you think about that and about the larger issue of usability?
I think that sounds reasonable. The documentation can always be structured with the meat in the main vignette and then more advanced usage in another or farther down the page of a single vignette under an "Advanced Usage" header or some such.
Question: Since both reviewers recommend removing release_questions()
, where is it recommended that it go? These are helpful (for me) pre-release questions and the function is suggested by the R Packages book (here), but it doesn't suggest where to put it.
The reviewers raise some excellent points to consider about usability and organization. The package feels greatly improved by virtue of their comments and suggestions. Thank you again for your time and your invaluable suggestions. rnassqs
is much cleaner and much improved as a result.
Below I detail some more general thoughts that were raised by the reviewers and my response, and then detail specific response to each reviewer separately.
One reviewer suggestion was to improve the interface to make it more user friendly, i.e. that the package seemed to be built around the needs of the API rather than the needs of the user. To some extent this is a function of the inflexibility of the API itself. A data request to the Quick Stats API returns JSON that, when parsed to a data.frame, results in a data.frame that has 39 columns. Unfortunately there is no way to limit the number of columns returned. The idea of being able to select
a la dplyr
is great, but since all 39 columns must be returned, it seems best to leave it to the user to select
after the call is made.
In response to suggestions about parameters and user experience I've made two changes. The first allows for specifying either a list of parameters in nassqs
as was the previous case, or specifying each parameter as a separate argument to nassqs
, as was suggested by @nealrichardson. In addition, I've added links to parameter documentation in the API, and now nassqs_params()
returns a list of parameters, while nassqs_params("agg_level_desc")
returns a description of the "agg_level_desc" parameter. I've also updated the vignette to show both methods.
The question of pagination came up repeatedly. It's an interesting one in the context of this API. There is not a direct way to paginate results that the API supports. Typically I end up subsetting by year or geography to make the query small enough. rnassqs
could try to subset by year or geography automatically, perhaps with a series of rules that first subset by year and if only one year is requested or the request is too large, then also subsetting by a smaller geography. However, there are potential issues here. For example, a state-level request will not necessarily result in the same data as collecting all counties for that state for two reasons:
Automatically subsetting by year would be doable though. I have added an issue for a future release to do so. In the meantime I have also added information in the error message to suggest how to subset the query. I have also included information on iteration to subset queries in the vignette.
release_questions()
(but see question at the top of this response)@adamhsparks provided some excellent feedback on issues of usability and potential unnecessary complexity. In particular, this comment was helpful.
I feel a bit like the functionality for the end-user is a bit overly complicated. This sort of package to me should just return data in a data.frame or list or some other R object that I can easily work with. In some cases I get a server response, in others I can ask for raw JSON or other file formats. I don't really see a good reason for this in R. I guess the functionality could be offered, but I think it needs to be hidden under a few layers for more advanced users that might want to use it. Rather it should just be a simple query for the data that I want and returning it in a format that I can readily use in R without extra steps. Just fetch the data and format it for me in one single step. I think it's possible with this package, but the documentation is a bit convoluted and not clear to me. There seem to be two major and related issues
While it's true that the package contains nassqs
, which will just simply query and return a data.frame object without the user having to specify anything, I have reorganized and rewritten the documentation and vignette to emphasize nassqs
rather than the low-level functions nassqs_GET
and nassqs_parse
. I have added documentation and changed the vignette to focus on the ease of use aspect, rather than on building a query using the core functions.
The second issue concerned the organization of functions. I have reorganized functions into files by collective functionality, in an effort to meet the guidelines suggested in R Packages, which states
While you’re free to arrange functions into files as you wish, the two extremes are bad: don’t put all functions into one file and don’t put each function into its own separate file.
Now functions dealing with the request and parsing of the request are in request.R
. Authorization functions are in auth.R
. Helpers are in helpers.R
. Functions dealing with parameters and parameter values are in params.R
. Functions that make queries easier are in wrappers.R
. I think this strikes a good balance, but am certainly open to suggestions to make this clearer if needed.
.Renviron
file in the README and vignettenassqs_fields()
in the vignette, which now refer to nassqs_param()
nassqs_field_values()
in the vignettenassqs_GET
, based on style guidelines from the httr
package: Best practices for API packagesbase_url
as a parameter in nassqs_GET
nassqs_params()
nassqs_param_values()
@nealrichardson brought up several excellent points about the API and especially about testing and ease of use and focusing on the needs of the user. I feel it is easier to define a list of parameters and submit that as a single argument to nassqs
, especially for example when iterating over a collection of queries. However, I recognize both needs, and have made it possible to call nassqs
in either of two ways:
# First method, a named list of parameters
params <- list(agg_level_desc = "STATE",
state_alpha = c("VA", "WA"))
nassqs(params)
# Second method, separate arguments
nassqs(agg_level_desc = "STATE", state_alpha = c("VA", "WA"))
# Or without capitalizing
nassqs(agg_level_desc = "state", state_alpha = c("va", "wa"))
I have expanded the vignette to demonstrate both methods and to emphasize the iteration and pagination of data available by iterating over a list of parameter lists.
Many of @nealrichardson's suggestions involve simplifying the interface, and I think the new function calls are much improved in this regard. These suggests were a real gem. Authorization is simpler, functions have fewer and simpler arguments, and overall ease of use is improved. His suggestion of allowing year >= 2012
instead of (or in addition to) year__GE = 2012
is also a good one. I have not implemented it here because I suspect LIKE
and NOT LIKE
would be slightly more difficult. I have created an issue to implement this in a future release.
Another concern was that all data is in character format rather than numeric for columns that are numeric. The reason is that the Quick Stats data lists suppressed or unavailable information in a variety of character-based ways. As a result the Value
field may contain "(D)", "(Z)", or "(S)" rather than numbers. Converting to numeric makes these values NA
, which loses the specific information about why the data is missing. It is true that it is easy enough to convert to numerical format, but in my opinion keeping this information about why data is missing is important.
The httptest
package is a huge help and I wish I had known about it when I was asking on twitter about API testing months ago. I've reorganized and updated the tests to use mock API calls, and also with tests in test files that correspond to the function file names in the R
directory. For example, test-requests.R
contains tests for functions in R/request.R
.
pkgdown
nassqs
to top of file since it is the main functionnassqs_fields <- nassqs_params
nassqs_GET
, removed url_only
and format
as function parametersnassqs_parse
, used inherits
rather than class() ==
.nassqs_parse
RE: "L153: when is not a response object?", at times the API is not working, so in that case this returns the error message directly.nassqs_parse
, removed unreachable warning nassqs_parse
, added else case and changed read.table
to read.csv
nassqs
and nassqs_parse
, change 'raw' to 'text' as an option for the as
parameternassqs_check
, RE: "what do I do if my request is too large? Give some recommendation for how to fix it. If it means that you need to make paginated requests, how would I do that? An even friendlier solution would be for you to handle the pagination for the user so they don't need to know/care about this API constraint."expand_list
, added @keywords internal and expanded with a description of what it does and why.nassqs_auth()
, now nassqs_GET
checks for the environmental variable NASSQS_TOKEN
, and nassqs_auth
simply sets that token.if() { }
by removing bracesrnassqs::
from functions.secret
and references to it..secret
for testinghttptest::with_mock_api
is.numeric(as.numeric(...))
This looks much improved as I've glanced over it. Thanks for thoughtfully responding to our reviews and comments. Thanks for explaining any reasons why my suggestions weren't followed, I have no objections to any of them. Some of my suggestions have been based on CRAN's, umm, erratic(?) enforcement of rules from time-to-time, so ignoring some of my suggestions are probably fine as I'm not sure that I use title case in all my documentation everywhere but think I was pulled up on it once before.
I think that the organisation of the functions is much more clear now and agree with Hadley on not all in one and not one only per .R file.
Regarding the question on release_questions()
, devtools::release()
actually asks most of those and more when you use it, I'd suggest using that rather than including questions for yourself in the package NAMESPACE.
For expand_list()
I'd use a @noRd
tag since it does not need to be exposed to the end-user, that I can tell? Documentation is good, I do that for my internal functions so I know what they do, but it shouldn't clutter the user's experience having it documented unless it's used somewhere that I'm missing where an end-user actually calls it?
There's no need for the CITATION file to be incomplete as you've suggested. It should have two entries after acceptance to JOSS. One should just be for the package, the current version number and year it was released that will automatically update with new releases, which you can set up now. The second is the JOSS paper citation that will never change. The example I provided shows this.
I'm curious, how is it different than usdarnass
, which you have mentioned in the README now? This isn't detailed in the original submission.
@adamhsparks thank you. I've updated the CITATION file and also added @noRd to expand_list()
.
Regarding usdarnass
, I added the reference to the README after I found out about it, which was after I had submitted for rOpenSci review. I'm fairly sure rnassqs
was developed first, since my first git commit was in June 2015, while theirs was November 2018, and rnassqs
was published on CRAN on May 03 while usdarnass
was published on CRAN on June 21. I think they were actually developed unaware of each other. If you have any thoughts or suggestions about a course of action I'd be all ears. It seems we could basically continue to develop in parallel or we could merge packages. I haven't reached out to the authors other than make a suggestion on an issue to let them know they could allow for multiple options as I write below.
The differences are small as far as I can tell:
rnassqs
makes all of the query parameters available, while usdarnass
only allows a subset. rnassqs
allows multiple options like state_alpha = c("VA", "WA")
, but after I commented on an issue on usdarnass
to say it was possible to do that, usdarnass
does that as well.rnassqs
took a list of parameters like nassqs(params)
, while usdarnass
takes parameters like nass_data(state_alpha = "VA", agg_level_desc = "state")
. Now as per suggestion from @nealrichardson rnassqs
works either way.I'll just briefly comment that this all sounds good in principle and I look forward to re-reviewing in detail, though I won't be able to get to that until early next week.
Thanks, @potterzot. As I said, it was a quick glance. Echoing what @nealrichardson said, I need to fully re-review everything. Those were just the few things I found quickly so I commented.
Thanks for the detailed changes and response, @potterzot.
@adamhsparks and @nealrichardson Thanks for looking over the changes. Could you please complete your re-review and either request additional changes or vote to approve the package by August 8?
Will do!
-- Dr Adam H. Sparks Associate Professor of Field Crops Pathology | Centre for Crop Health Institute for Life Sciences and the Environment (ILSE) | Research and Innovation Division University of Southern Queensland | Toowoomba, Queensland | 4350 | Australia
Phone: +61746311948 Mobile: +61415489422 Mobile: +61415489422 On 2 Aug 2019, 5:38 AM +1000, Lincoln Mullen notifications@github.com, wrote:
Thanks for the detailed changes and response, @potterzot. @adamhsparks and @nealrichardson Thanks for looking over the changes. Could you please complete your re-review and either request additional changes or vote to approve the package by August 8? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Nice work. This looks much improved. I found a few sylistic issues again, and I had some suggestions for how to improve the testing, but rather than write them here, I've made a pull request with them for you to review/merge. The other reason I implemented these suggestions myself was that I got a test failure locally because one of the tests required auth but did not have the appropriate skip_if_no_auth()
check, so I was already in the code to debug that. There was also a R CMD check
issue I encountered because the .Rbuildignore still wasn't correctly excluding previously built tarballs. All fixed by that PR.
Test coverage could be better, though my PR bumps it up to 91%. Happy to advise on covering the conditions that currently are missed that if you want, though I won't withhold approval based on not reaching 100% line coverate.
One last followup point:
In nassqs_parse RE: "L153: when is not a response object?", at times the API is not working, so in that case this returns the error message directly.
httr::GET()
will either return a response object (potentially with an error status, which you will have already have handled before getting here because you pass through the nassqs_check()
function) or GET()
will itself error (like if your internet is down). I don't think it's possible for it to return anything different.
I have a few last minor points (nitpicks?) that if changed will improve the package. Overall it's greatly improved and I like how it works. Congrats!
1.* @lmullen already noted this much earlier in the process. Please remove the "Date" field from the DESCRIPTION file. CRAN will automatically assign this and it's prone to ending up being out of date if you rely upon updating it manually.
paste()
in a stop()
, e.g.,
stop(paste0("Your query parameters include 'format' as ", format, " but it should be one of 'json', 'xml', or 'csv'."))
.
It should be written as:
stop("Your query parameters include 'format' as ", format, " but it should be one of 'json', 'xml', or 'csv'.")
There are several instances of this that I noted.3.* "inst/examples/example_parameters.R" has an incomplete final line. Add a line return to the file at the end of the file to fix this.
nasqss_GET()
is inconsistent in how it references functions in the description. Some of the other R functions discussed in that paragraph use the function()
convention while referring to nasqss_GET()
is only as nassqss_GET
minus the ()
. As a user I find it more clear if the ()
is used in documentation to indicate a function, not a parameter is being discussed. Note I didn't check all documentation, I just noticed this here.5.* The documentation example for nassqss_param_values
has "YIElD" not "YIELD" in the comment section, is this correct?
6.* nassqss_parse()
documentation Description field is missing a "'" prior to (Z).
7.* I'm not sure that here needs to be listed in the DESCRIPTION Suggests field. It's only used in data-raw as far as I can tell? If so, that folder is not included in the R package so shouldn't need to be specified here.
8.* In the data-raw/get_test_data.R file, it might be good to set the version
to 2
for maximum compatibility in the near term with versions of R from 1.4.0 to current. If it's NULL
it will default to version 3.
The README code could be formatted a bit more nicely too using proper RMarkdown chunks, e.g.,
```{r eval=FALSE}
# Via devtools
library(devtools)
install_github('potterzot/rnassqs')
# Via CRAN
install.packages("rnassqs")```
The README may not need to be a .Rmd? I can't see that you have any executed R code so you could simplify and just use a .md file.
Consider using codemetar::write_codemeta()
to create and update a .json metadata file for the package?
Once these are addressed (at your discretion for many of them) I'm happy to recommend accepting. I've added a "*" after the number and before the comment for the items that I think must be fixed. Those without are at your discretion.
@adamhsparks Thank you for the incredible detail in this! Much appreciated. I've made all of the changes you suggest except for this one, which I'm unclear on:
8.* In the data-raw/get_test_data.R file, it might be good to set the version to 2 for maximum compatibility in the near term with versions of R from 1.4.0 to current. If it's NULL it will default to version 3.
What do you mean by setting the version? Do you mean setting the R
version in DESCRIPTION?
@nealrichardson I've reviewed and merged your PR, thanks! There were two tests for error handling that were failing:
Because they were within the with_mock_api()
block they were returning a GET
object instead of the error. I moved these to the authorization block and they work.
Regarding .Rbuildignore
excluding tarballs, I had included that at some point long ago, but removed it for a reason that I don't remember. Thank you for adding it.
@nealrichardson PS if I have your okay I've also added you as a contributor in DESCRIPTION.
Where did you see the failure? The PR merge commit passed on Travis. They don't require auth because they use the mock responses I added here: https://github.com/potterzot/rnassqs/pull/15/files#diff-7c5a672790a8227968bfd57c3a71faa0 Did you possibly make other changes that altered the querystring in the request? That would change the request URL and thus change the mock file path it was looking for. If so, you can rename those mock files to match and they'll be fine.
Sure, happy to be listed at contributor.
@nealrichardson Hmm, I checked out your commit again and am having no trouble. I must have changed something that started giving me those errors. I returned those files to their original state in a new commit. Also removed the response check based on your note about nassqs_GET()
always returning a response object.
Hi @potterzot, The RDS version of the file is 1, 2 or (default) 3. This changed with R 3.6 to "3". So many users may not have the ability to read a version 3 RDS file yet if they've not upgraded to R >= 3.6.
See ?saveRDS
for more on version
Ah I see, that script was generally outdated anyway, so thank you for pointing that out! I was also unaware of the breaking change and do a lot of my data storage in RDS so thank you.
@lmullen, I've updated my initial review with the suggestion to accept, ticked the rest of the boxes and updated time spent reviewing.
@adamhsparks Great, thanks so much.
@nealrichardson Is there anything else outstanding from your perspective?
All good, just checked the boxes.
Approved! Thanks @potterzot for submitting and making all the requested changes. And thanks, @adamhsparks and @nealrichardson for especially thorough reviews. Much appreciated.
@potterzot here are some to-dos to complete the onboarding process.
[ ] Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do. Once you do I'll give you admin access.
[ ] Add the rOpenSci footer to the bottom of your README
" [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
"
[ ] Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname)
.
[ ] We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
Since you are also publishing to JOSS, we need you to do the following.
This submission has been accepted to rOpenSci. The review thread can be found at <URL TO THIS REVIEW>.
You can also release a new version to CRAN.
Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"
-type contributors in the Authors@R
field (with their consent). More info on this here.
Welcome aboard! We'd love to host a blog post about your package - either a short introduction to it with one example or a longer post with some narrative about its development or something you learned, and an example of its use. If you are interested, review the instructions, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.
We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.
Thank you! This is very exciting. Thank you all for your fantastic help and efforts. @adamhsparks, I am realizing I didn't specifically ask your permission to include you as a reviewer. @lmullen, I've switched the repository over to ropensci and made changes to the readme.
Ok, you should be an admin on the repository again, @potterzot.
@stefaniebutland I would be happy to do a blog post, probably realistically not possible until October. I think I could do a longer article discussing why I started developing the package and how it's been helpful in my research and what I've learned in the process, if that seems like a good fit. Happy to do a shorter one as well.
@potterzot Sounds good! Please submit a draft when you're ready and we can select a publication date at that time.
why I started developing the package and how it's been helpful in my research
If you include a "cool" example (that's not shown elsewhere) this is especially valuable as a way for readers to see how they might use the package.
what I've learned in the process
Always good to share this. Try to choose a couple of key points.
Thanks!
Hi @potterzot. I'm checking in to let you know I have a blog post slot open for Tues Oct 29 if you still wanted to do a long form post. A shorter tech note is quite appropriate and could be published any time, after my review.
@stefaniebutland I couldn't make the Oct 29 deadline but have a working draft now, do you have a good future date that would work? The draft is basically done but I can change the template date and file names to match the anticipated date.
Also, I'm not sure where to put images. The template links I can change, but I don't see the corresponding img/blog-images
directory in the roweb2
repository.
@stefaniebutland nevermind on the second part, I figured out the images. It helps if I read the instructions in full!
For now, please date 2019-11-26. That might change based on submission status of other posts that already have dates assigned.
I admit there are a LOT of instructions ;-)
Submitting Author: Nicholas A. Potter (@potterzot)
Repository: https://github.com/potterzot/rnassqs Version submitted: 0.4.0 Editor: @lmullen
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
Data retrieval because 'rnassqs' allows access the the NASS 'Quick Stats' API to fetch data.
Target audience is those who want to automate or reproducibly fetch data from 'Quick Stats', including agronomists, economists, and others working with agricultural data. Scientific applications include analysis of agricultural data by administrative region (e.g. county, state, watershed), economic analysis of policies that affect agriculture, and sociological/demographic analysis of agricultural producers over time.
None that I have been able to find.
297, responded to by @noamross
Technical checks
Confirm each of the following by checking the box. This package:
Publication options
JOSS Options
- [X] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [X] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [X] The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.2662520 - (*Do not submit your package separately to JOSS*)MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct