Open MatthieuStigler opened 5 years ago
As I recall, this package is built off of https://github.com/rdinter/nassR but with better documentation and a tad more functionality. The nassR project used a lot of the https://github.com/emraher/rnass syntax and functionality for querying the API. The biggest reason for upgrading rnass to nassR was better documentation and cleaning up how the API is stored a local machine. And I only came across the https://github.com/potterzot/rnassqs package after finishing up the first pkgdown version of usdarnass, so any of the functionality from that package has yet to be incorporated.
For the potential comparisons:
nass_data
and it checks the query size before querying the QuickStats API (nass_count
function checks limit). If the 50,000 limit is exceeded, then an error is thrown indicating that the query needs to be subset to fit. I am not sure there is a splitting method that could be automated because of the fairly large number of parameters.
map
or apply
command before calling the nass_data
function which serves the same purpose.
numeric_vals
which is defaulted to false and returns values as characters. But, if it is set to true then the values are parsed to numeric and the non-disclosed values will be returned as NA.
simplify
argument, but I think it would be a bit tedious to figure out which columns can be deemed unnecessary for a query.
rnassqs can handle multiple values for a given parameter, for example if you want multiple states you can add "state_alpha = c('WA', 'OR')". The API handles multiple values by listing the parameter and value multiple times in API call.
The biggest deficiency I see in rnassqs
right now is sorting through the selection of valid parameters and values. It'd be interesting to talk about a best way of doing that that would make it easier for users to get at the data they want without the associated pitfalls of entering an invalid combination of parameter values.
Hi @potterzot that is a great catch by you that the API does allow for multiple states in its call! I had not considered that the API would allow for multiple arguments for state_alpha. So instead of making two API calls, rnassqs
would have one of the form:
http://quickstats.nass.usda.gov/api/get_counts/?key=api_key&commodity_desc=CORN&year__GE=2012&state_alpha=WA&state_alpha=OR
That is a clever approach and partially avoids a map
or apply
solution that I have been using. It also mimics the multiple selections in a drop-down menu on the web interface of Quick Stats which I like. The 50,000 limit still applies to that call and as of right now map
or apply
is my preferred method to avoid the limit. I'm still not sure if there is a systematic way one could automate splitting a call if it exceeds 50,000 because of all the different parameters.
When I last looked at rnassqs
it appeared that the biggest difference in functionality was how rnassqs
handles the parameters as one argument instead of each parameter being an argument in the function. Passing the parameters as a list in the functions makes rnassqs
more flexible for API calls, but the user needs to know both the parameter name and the value they want to pass through. Whereas usdarnass
will have each parameter in a call as a potential argument -- but I only included the parameters that are available drop-down menus in Quick Stats so usdarnass
could not handle state_alpha = "WA"
because I omitted that parameter. Part of this was to mimic the functionality of the web interface, but also making each parameter an argument will cut down on the number of errors an API call will have from typos since it is difficult to make a typo for the parameter name in usdarnass
especially with how RStudio allows for you to tab through arguments in a function. It will still be susceptible to typos that lead to incorrect parameter values though, I cannot think of a good way to make a correction to commodity_desc = "CRON"
.
My hunch is that if a user already knows the parameter and value for an API call to Quick Stats, then they likely know about the API documentation and could create the GET request themselves and a package would be of limited use to such an advanced user. But admittedly, I might not know the full benefits of passing parameters as a list like what rnassqs
does and I'd be interested to know what they might be.
This package seems interesting! As there are 2 alternative package, could you make a small comparison, saying what is the advantage of
usdarnass
?Potential comparison points:
Value
converted to numeric? How are (D) values handled?Thanks!!