rdinter / usdarnass

An alternative for downloading various USDA data from Quick Stats through R.
GNU General Public License v3.0
10 stars 1 forks source link

Question: compare to other nass R packages? #1

Open MatthieuStigler opened 5 years ago

MatthieuStigler commented 5 years ago

This package seems interesting! As there are 2 alternative package, could you make a small comparison, saying what is the advantage of usdarnass?

Potential comparison points:

Thanks!!

rdinter commented 5 years ago

As I recall, this package is built off of https://github.com/rdinter/nassR but with better documentation and a tad more functionality. The nassR project used a lot of the https://github.com/emraher/rnass syntax and functionality for querying the API. The biggest reason for upgrading rnass to nassR was better documentation and cleaning up how the API is stored a local machine. And I only came across the https://github.com/potterzot/rnassqs package after finishing up the first pkgdown version of usdarnass, so any of the functionality from that package has yet to be incorporated.

For the potential comparisons:

potterzot commented 5 years ago

rnassqs can handle multiple values for a given parameter, for example if you want multiple states you can add "state_alpha = c('WA', 'OR')". The API handles multiple values by listing the parameter and value multiple times in API call.

The biggest deficiency I see in rnassqs right now is sorting through the selection of valid parameters and values. It'd be interesting to talk about a best way of doing that that would make it easier for users to get at the data they want without the associated pitfalls of entering an invalid combination of parameter values.

rdinter commented 5 years ago

Hi @potterzot that is a great catch by you that the API does allow for multiple states in its call! I had not considered that the API would allow for multiple arguments for state_alpha. So instead of making two API calls, rnassqs would have one of the form:

http://quickstats.nass.usda.gov/api/get_counts/?key=api_key&commodity_desc=CORN&year__GE=2012&state_alpha=WA&state_alpha=OR

That is a clever approach and partially avoids a map or apply solution that I have been using. It also mimics the multiple selections in a drop-down menu on the web interface of Quick Stats which I like. The 50,000 limit still applies to that call and as of right now map or apply is my preferred method to avoid the limit. I'm still not sure if there is a systematic way one could automate splitting a call if it exceeds 50,000 because of all the different parameters.

When I last looked at rnassqs it appeared that the biggest difference in functionality was how rnassqs handles the parameters as one argument instead of each parameter being an argument in the function. Passing the parameters as a list in the functions makes rnassqs more flexible for API calls, but the user needs to know both the parameter name and the value they want to pass through. Whereas usdarnass will have each parameter in a call as a potential argument -- but I only included the parameters that are available drop-down menus in Quick Stats so usdarnass could not handle state_alpha = "WA" because I omitted that parameter. Part of this was to mimic the functionality of the web interface, but also making each parameter an argument will cut down on the number of errors an API call will have from typos since it is difficult to make a typo for the parameter name in usdarnass especially with how RStudio allows for you to tab through arguments in a function. It will still be susceptible to typos that lead to incorrect parameter values though, I cannot think of a good way to make a correction to commodity_desc = "CRON".

My hunch is that if a user already knows the parameter and value for an API call to Quick Stats, then they likely know about the API documentation and could create the GET request themselves and a package would be of limited use to such an advanced user. But admittedly, I might not know the full benefits of passing parameters as a list like what rnassqs does and I'd be interested to know what they might be.