ropensci / software-review

rOpenSci Software Peer Review.
292 stars 104 forks source link

Submission: rnassqs - access the USDA-NASS 'Quick Stats' data through their API #298

Closed potterzot closed 5 years ago

potterzot commented 5 years ago

Submitting Author: Nicholas A. Potter (@potterzot)
Repository: https://github.com/potterzot/rnassqs Version submitted: 0.4.0 Editor: @lmullen
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD


Package: rnassqs
Type: Package
Title: Access the NASS 'Quick Stats' API
Version: 0.4.0.9000
Date: 2019-04-29
Authors@R: c(
  person('Nicholas', 'Potter', email='potter.nicholas@gmail.com', role = c('aut', 'cre')),
  person('Joseph', 'Stachelek', email='', role = c('ctb')),
  person('Julia', 'Piaskowski', email='', role = c('ctb'))) 
Maintainer: Nicholas Potter <potter.nicholas@gmail.com>
Description: Interface to access data via the United States Department of 
  Agricultre's National Agricultural Statistical Service (NASS) 'Quick Stats' 
  web API <https://quickstats.nass.usda.gov/api>. Convenience functions 
  facilitate building queries based on available parameters and valid parameter 
  values.
URL: https://github.com/potterzot/rnassqs
BugReports: http://www.github.com/potterzot/rnassqs/issues
License: MIT + file LICENSE
LazyData: TRUE
Language: en-US
Imports:
  httr,
  jsonlite,
Suggests:
  testthat,
  here,
  knitr,
  rmarkdown
RoxygenNote: 6.1.1
Encoding: UTF-8
VignetteBuilder: knitr

Scope

Data retrieval because 'rnassqs' allows access the the NASS 'Quick Stats' API to fetch data.

Target audience is those who want to automate or reproducibly fetch data from 'Quick Stats', including agronomists, economists, and others working with agricultural data. Scientific applications include analysis of agricultural data by administrative region (e.g. county, state, watershed), economic analysis of policies that affect agriculture, and sociological/demographic analysis of agricultural producers over time.

None that I have been able to find.

297, responded to by @noamross

Technical checks

Confirm each of the following by checking the box. This package:

Publication options

JOSS Options - [X] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [X] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [X] The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.2662520 - (*Do not submit your package separately to JOSS*)
MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

lmullen commented 5 years ago

@potterzot I will be the editor for this peer review.

Here are my editorial checks.

Editor checks:


Editor comments

In running R CMD check I get the following:

❯ checking top-level files ... NOTE
  Non-standard files/directories found at top level:
    ‘paper.bib’ ‘paper.md’
✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite
    often. A build date will be added to the package when you perform `R CMD build` on
    it.

I am in the process of looking for peer reviewers.


Reviewers: Due date:

potterzot commented 5 years ago

@lmullen thank you for your comments. I've made a commit to address your comments.

Regarding this note:

Could you clarify, please, what is tested when the user has not provided an API key? For tests that you are skipping if no API key is available, is it possible to stub or mock the API? The testthat package includes some mocking functions, as well as testthat::expect_equal_to_reference() which can be used for that purpose.

Tests in tests/testthat/test-oncran.R beginning on line 55 include mock API calls. The tests make the request, specifying that the function return the GET request URL rather than actually make the request, and that request is compared to the correct URL. There are three API paths to test:

There are additional mock API tests that follow those, but those are for convenience functions for making specific requests, e.g. nassqs_area and nassqs_yield, which wrap nassqs_GET.

Tests in tests/testthat/test-local.R make actual API calls using an API key, and are not possible on CRAN.

Is there a better way of organizing tests that makes it clear where the API mock tests are done and where the actual API call tests are done?

lmullen commented 5 years ago

@potterzot That sounds fine to me. I just wanted to make sure the reviewers and I understood.

lmullen commented 5 years ago

@potterzot Apologies for the delay in getting this review going. One person has agreed to review but a string of others have been unavailable at the start of the summer. Still looking for that second reviewer and then the review will begin.

lmullen commented 5 years ago

Thanks to our reviewers for agreeing to take on this package.

Reviewer: @adamhsparks Reviewer: @nealrichardson Due date: 2019-07-11

You can find the guide for reviewers here. Please let me know if you have any questions.

adamhsparks commented 5 years ago

I know I'm behind. Sorry, I've had a rather busy time lately. I'm starting the review today and will see how I go.

adamhsparks commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 7


Review Comments

This package offers great functionality and defines very nicely why it's necessary. I'm happy to see this sort of package being written.

Following are my comments.

Comments on Vignette

When I try using the nassqs_token() I receive and error message.

> library(rnassqs)
> nassqs_token()
Error in nassqs_token() : could not find function "nassqs_token"

This example works as expected,

nassqs_fields()
[1] "agg_level_desc"        "asd_code"             
 [3] "asd_desc"              "begin_code"           
 [5] "class_desc"            "commodity_desc"       
 [7] "congr_district_code"   "country_code"         
 [9] "country_name"          "county_ansi"          
[11] "county_code"           "county_name"          
[13] "CV"                    "domaincat_desc"       
[15] "domain_desc"           "end_code"             
[17] "freq_desc"             "group_desc"           
[19] "load_time"             "location_desc"        
[21] "prodn_practice_desc"   "reference_period_desc"
[23] "region_desc"           "sector_desc"          
[25] "short_desc"            "state_alpha"          
[27] "state_ansi"            "state_name"           
[29] "state_fips_code"       "statisticcat_desc"    
[31] "source_desc"           "unit_desc"            
[33] "util_practice_desc"    "Value"                
[35] "watershed_code"        "watershed_desc"       
[37] "week_ending"           "year"                 
[39] "zip_5"                

however,

?nassqs_fields()

returns a help file that says,

Deprecated: Return list of NASS QS parameters.
Description
Deprecated. Use nassqs_params() instead.

Suggest updating vignette to match most recent functionality.


Another error occurs with this example from the vignette.

> rnassqs::nassqs_field_values(field = 'unit_desc')
Error: 'nassqs_field_values' is not an exported object from 'namespace:rnassqs'

The "All together" section script is not functional.

fields <- nassq_fields()
Error in nassq_fields() : could not find function "nassq_fields"

Comments on functions

My suggestion is remove release_questions() entirely and split out the functions into their own files with the function name being the file-name. This helps make it easier to keep the functions organised and updated. Splitting the functions out is entirely up to the authors if they wish to implement this structure, however I feel that removing release_questions() is necessary.

Comments on documentation

  # See all values available for the statisticcat_desc field. Values may not
  # be available in the context of other parameters you set, for example
  # a given state may not have any 'YIElD' in blueberries if they don't grow
  # blueberries in that state.
  # Requires an API key:
  #nassqs_param_values("statisticcat_desc", key = "my api key")

Should appear as

  # See all values available for the statisticcat_desc field. Values may not
  # be available in the context of other parameters you set, for example
  # a given state may not have any 'YIElD' in blueberries if they don't grow
  # blueberries in that state.
  # Requires an API key:

  nassqs_param_values("statisticcat_desc", key = "my api key")
> params = list(commodity_name="CORN", 
+               year=2012, 
+               agg_level_desc = "STATE",
+               state_alpha = "WA",
+               statisticcat_desc = "YIELD")
> nassqs_GET(params)
Response [https://quickstats.nass.usda.gov/api/api_GET?key=XXXXXXXXXXXXXXXXXXXXXXX&commodity_name=CORN&year=2012&agg_level_desc=STATE&state_alpha=WA&statisticcat_desc=YIELD&format=JSON]
  Date: 2019-07-13 04:48
  Status: 200
  Content-Type: application/json
  Size: 148 kB

What do I do with this response value? How is this response yields for corn in 2012 in Washington?

Does the end-user even need to interface with this function or should it be hidden and used by the other functions in the package that return data in data.frames or other R objects?

# Set parameters and make the request
params = list(commodity_name="CORN", 
              year=2012, 
              agg_level_desc = "STATE",
              state_alpha = "WA",
              statisticcat_desc = "YIELD")
req <- nassqs_GET(params)
nassqs_parse(req, as = "data.frame")

would be more clear as

# Set parameters and make the request
params <- list(
  commodity_name = "CORN",
  year = 2012,
  agg_level_desc = "STATE",
  state_alpha = "WA",
  statisticcat_desc = "YIELD"
)
req <- nassqs_GET(params)
corn <- nassqs_parse(req, as = "data.frame")
head(corn)

Comments on code style

nealrichardson commented 5 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 5


Review Comments

Looks like I get to be the infamous "Reviewer 2" this time :)

Since Adam dutifully got his review in on time and I'm delayed a couple of days, I'll say at the start that I agree generally with all of his points (with one minor exception, noted below), and in the interest of brevity, I've tried to avoid reiterating comments he made.

I greatly appreciate packages like this and the effort that goes into making them. This is the kind of package I wish existed when I was doing my dissertation--would have made at least the data retrieval and management part of research a lot cleaner. So, thank you for your contribution.

Style and code-hygiene comments aside, my main suggestion to you is to think about how you can orient the package towards the R user and their needs. The purpose of this package should be to encapsulate your hard-earned knowledge of how this API works and enable data scientists to access that data as naturally as possible. However, the package currently seems to expose an R interface centered around the needs of the API and not the R user. For example, the functions that seem to make up the public interface for the package are mostly stuck in "helpers.R", as if they're ancillary when really they should be front and center.

Here's a more concrete example: reading the vignette discussion about the comparison operators, it looks like the way to get data on corn production in Virginia since 2012 is

nassqs(list(commodity_desc="CORN", year__GE=2012, state_alpha="VA")))

But when I think about the R code that I want to type to get that data, it looks more like

nassqs(commodity == "corn", state == "va", year >= 2012)

That is, not case-sensitive, handles the different ways commodities and states (for example) could be referenced (integer code, abbreviation, long name), and more naturally handles the comparison operators (can translate year >= 2012 to year__GE=2012). You could go even farther and have a dplyr syntax where you can filter() rows and select() columns, so that the R user is essentially describing the data.frame they want to get, and your package figures out how to turn that into API requests, handle pagination, etc., and return the shape of data that an R user expects.

I'm not saying you need to do exactly this/all of this now (though some things, like case insensitvity, would be easy enough), but more to suggest ideas for the future and to show what I mean in terms of thinking of the R interface and not just what the API demands. When I wrap APIs (and in general write packages), I like to start by thinking about the R code I want (users) to type, and that usually means doing more work in the package to mask the awkwardness of the underlying HTTP API.

Another way to get more R-user-centric would be to improve the documentation, which I found to be thin, often tautological (@param api_path the API path), and not helpful for an inexperienced user. Documentation is a bit of work but it's worth doing, especially for how it can help focus you on what the intended user needs to know in order to get value from the package--and how you can make what details they need to know as minimal as possible.

A good way to see what the docs will look like to your users is to use pkgdown to build the package website. You don't need to publish this (though it's nice if you do), but I've found that even doing it locally makes it really visible to me where my documentation is not the most helpful or beautiful.

Observations from using the package

I got a token and did the example on the readme (corn yields in Virginia and Pennsylvania for since 2000).

> library(rnassqs)
> Sys.setenv(NASSQS_TOKEN="REDACTED")
> params <- list("commodity_desc"="CORN",
+                   "year__GE"=2000,
+                   "state_alpha"=c("VA", "PA"),
+                   "statisticcat_desc"="YIELD")
> df <- nassqs(params)

That all worked as expected. That said,

Specific code notes

Miscellaneous

lmullen commented 5 years ago

@adamhsparks and @nealrichardson: Thanks to both of you for these thorough and detailed reviews.

@potterzot Please take a look at these comments from the reviewers. There is lots of good advice in here. It does seem like some of it will require some pretty fundamental reassessment of the package's user-facing interface. Do you think you can make revisions within two weeks, our typical deadline? i.e., by July 29?

potterzot commented 5 years ago

@lmullen, @adamhsparks, @nealrichardson first let me say a big thanks, these are some really good suggestions and you clearly put in a lot of time to give some great feedback. While some of the changes are substantial, I hope that the underlying framework will make them relatively easy to implement. I think I can probably submit a revision by July 29th. Is it possible to ask for an extension if that becomes necessary?

lmullen commented 5 years ago

@potterzot Sure. If you can plan on a July 29 deadline, that would be best, but an extension would be fine if it becomes necessary.

adamhsparks commented 5 years ago

@potterzot, I hope you find my review useful and not being negative. This is an extremely valuable package, my interest is in helping you improve it. Feel free to ask for help or guidance along the way. I'm happy to contribute.

potterzot commented 5 years ago

Hi @adamhsparks, thank you for your time and willingness to help. I've finished making the straight-foward changes you suggested, and am thinking about the larger issues, which seem to boil down to two related issues:

  1. How much should we expose to the user
  2. How can we simplify the interface to make it more usable

The main function of the package is nassqs(), which fetches and returns parsed data. But I exposed nassqs_GET() and nassqs_parse() because I wanted to make it possible for advanced users to deal with any edge cases that might come up. Perhaps that's not really a concern here, and it would make sense to hide both of those functions. Working with output from nassqs_GET() requires some knowledge of the httr package and how requests work, so it would only be useful to someone who wants to see the raw results and knows what to do with them. nassqs_parse() on the other hand doesn't do much more than what jsonlite::fromJSON does, so I think it could be hidden without really removing any control from the user. My goal was to make it easy to use, but also to allow a user to dig into the deeper mechanism if necessary. This may be born out of personal frustration from when I've not understood what a hidden function is doing and had to root around in the source code to figure it out.

I propose hiding nassqs_parse(), expanding the documentation for nassqs_GET() so that it clearly states that the main function is nassqs() and that in general that function should not be needed. What do you think about that and about the larger issue of usability?

adamhsparks commented 5 years ago

I think that sounds reasonable. The documentation can always be structured with the meat in the main vignette and then more advanced usage in another or farther down the page of a single vignette under an "Advanced Usage" header or some such.

potterzot commented 5 years ago

Question: Since both reviewers recommend removing release_questions(), where is it recommended that it go? These are helpful (for me) pre-release questions and the function is suggested by the R Packages book (here), but it doesn't suggest where to put it.

Response to Reviewers

The reviewers raise some excellent points to consider about usability and organization. The package feels greatly improved by virtue of their comments and suggestions. Thank you again for your time and your invaluable suggestions. rnassqs is much cleaner and much improved as a result.

Below I detail some more general thoughts that were raised by the reviewers and my response, and then detail specific response to each reviewer separately.

Data size and usability

One reviewer suggestion was to improve the interface to make it more user friendly, i.e. that the package seemed to be built around the needs of the API rather than the needs of the user. To some extent this is a function of the inflexibility of the API itself. A data request to the Quick Stats API returns JSON that, when parsed to a data.frame, results in a data.frame that has 39 columns. Unfortunately there is no way to limit the number of columns returned. The idea of being able to select a la dplyr is great, but since all 39 columns must be returned, it seems best to leave it to the user to select after the call is made.

Parameter passing

In response to suggestions about parameters and user experience I've made two changes. The first allows for specifying either a list of parameters in nassqs as was the previous case, or specifying each parameter as a separate argument to nassqs, as was suggested by @nealrichardson. In addition, I've added links to parameter documentation in the API, and now nassqs_params() returns a list of parameters, while nassqs_params("agg_level_desc") returns a description of the "agg_level_desc" parameter. I've also updated the vignette to show both methods.

Pagination and handling requests larger than 50,000 records

The question of pagination came up repeatedly. It's an interesting one in the context of this API. There is not a direct way to paginate results that the API supports. Typically I end up subsetting by year or geography to make the query small enough. rnassqs could try to subset by year or geography automatically, perhaps with a series of rules that first subset by year and if only one year is requested or the request is too large, then also subsetting by a smaller geography. However, there are potential issues here. For example, a state-level request will not necessarily result in the same data as collecting all counties for that state for two reasons:

  1. Some data are collected at the state level only
  2. County-level data may be suppressed where the state level data is not

Automatically subsetting by year would be doable though. I have added an issue for a future release to do so. In the meantime I have also added information in the error message to suggest how to subset the query. I have also included information on iteration to subset queries in the vignette.

Specific changes

Response to Reviewer 1

@adamhsparks provided some excellent feedback on issues of usability and potential unnecessary complexity. In particular, this comment was helpful.

I feel a bit like the functionality for the end-user is a bit overly complicated. This sort of package to me should just return data in a data.frame or list or some other R object that I can easily work with. In some cases I get a server response, in others I can ask for raw JSON or other file formats. I don't really see a good reason for this in R. I guess the functionality could be offered, but I think it needs to be hidden under a few layers for more advanced users that might want to use it. Rather it should just be a simple query for the data that I want and returning it in a format that I can readily use in R without extra steps. Just fetch the data and format it for me in one single step. I think it's possible with this package, but the documentation is a bit convoluted and not clear to me. There seem to be two major and related issues

While it's true that the package contains nassqs, which will just simply query and return a data.frame object without the user having to specify anything, I have reorganized and rewritten the documentation and vignette to emphasize nassqs rather than the low-level functions nassqs_GET and nassqs_parse. I have added documentation and changed the vignette to focus on the ease of use aspect, rather than on building a query using the core functions.

The second issue concerned the organization of functions. I have reorganized functions into files by collective functionality, in an effort to meet the guidelines suggested in R Packages, which states

While you’re free to arrange functions into files as you wish, the two extremes are bad: don’t put all functions into one file and don’t put each function into its own separate file.

Now functions dealing with the request and parsing of the request are in request.R. Authorization functions are in auth.R. Helpers are in helpers.R. Functions dealing with parameters and parameter values are in params.R. Functions that make queries easier are in wrappers.R. I think this strikes a good balance, but am certainly open to suggestions to make this clearer if needed.

Specific Items

General package

Vignette

Code

Code Style

Documentation

Response to Reviewer 2

@nealrichardson brought up several excellent points about the API and especially about testing and ease of use and focusing on the needs of the user. I feel it is easier to define a list of parameters and submit that as a single argument to nassqs, especially for example when iterating over a collection of queries. However, I recognize both needs, and have made it possible to call nassqs in either of two ways:

# First method, a named list of parameters
params <- list(agg_level_desc = "STATE",
               state_alpha = c("VA", "WA"))
nassqs(params)

# Second method, separate arguments
nassqs(agg_level_desc = "STATE", state_alpha = c("VA", "WA"))

# Or without capitalizing
nassqs(agg_level_desc = "state", state_alpha = c("va", "wa"))

I have expanded the vignette to demonstrate both methods and to emphasize the iteration and pagination of data available by iterating over a list of parameter lists.

Many of @nealrichardson's suggestions involve simplifying the interface, and I think the new function calls are much improved in this regard. These suggests were a real gem. Authorization is simpler, functions have fewer and simpler arguments, and overall ease of use is improved. His suggestion of allowing year >= 2012 instead of (or in addition to) year__GE = 2012 is also a good one. I have not implemented it here because I suspect LIKE and NOT LIKE would be slightly more difficult. I have created an issue to implement this in a future release.

Another concern was that all data is in character format rather than numeric for columns that are numeric. The reason is that the Quick Stats data lists suppressed or unavailable information in a variety of character-based ways. As a result the Value field may contain "(D)", "(Z)", or "(S)" rather than numbers. Converting to numeric makes these values NA, which loses the specific information about why the data is missing. It is true that it is easy enough to convert to numerical format, but in my opinion keeping this information about why data is missing is important.

The httptest package is a huge help and I wish I had known about it when I was asking on twitter about API testing months ago. I've reorganized and updated the tests to use mock API calls, and also with tests in test files that correspond to the function file names in the R directory. For example, test-requests.R contains tests for functions in R/request.R.

Specific Items

General

Code

Code Style

Vignette

Tests

adamhsparks commented 5 years ago

This looks much improved as I've glanced over it. Thanks for thoughtfully responding to our reviews and comments. Thanks for explaining any reasons why my suggestions weren't followed, I have no objections to any of them. Some of my suggestions have been based on CRAN's, umm, erratic(?) enforcement of rules from time-to-time, so ignoring some of my suggestions are probably fine as I'm not sure that I use title case in all my documentation everywhere but think I was pulled up on it once before.

I think that the organisation of the functions is much more clear now and agree with Hadley on not all in one and not one only per .R file.

Regarding the question on release_questions(), devtools::release() actually asks most of those and more when you use it, I'd suggest using that rather than including questions for yourself in the package NAMESPACE.

For expand_list() I'd use a @noRd tag since it does not need to be exposed to the end-user, that I can tell? Documentation is good, I do that for my internal functions so I know what they do, but it shouldn't clutter the user's experience having it documented unless it's used somewhere that I'm missing where an end-user actually calls it?

There's no need for the CITATION file to be incomplete as you've suggested. It should have two entries after acceptance to JOSS. One should just be for the package, the current version number and year it was released that will automatically update with new releases, which you can set up now. The second is the JOSS paper citation that will never change. The example I provided shows this.

I'm curious, how is it different than usdarnass, which you have mentioned in the README now? This isn't detailed in the original submission.

potterzot commented 5 years ago

@adamhsparks thank you. I've updated the CITATION file and also added @noRd to expand_list().

Regarding usdarnass, I added the reference to the README after I found out about it, which was after I had submitted for rOpenSci review. I'm fairly sure rnassqs was developed first, since my first git commit was in June 2015, while theirs was November 2018, and rnassqs was published on CRAN on May 03 while usdarnass was published on CRAN on June 21. I think they were actually developed unaware of each other. If you have any thoughts or suggestions about a course of action I'd be all ears. It seems we could basically continue to develop in parallel or we could merge packages. I haven't reached out to the authors other than make a suggestion on an issue to let them know they could allow for multiple options as I write below.

The differences are small as far as I can tell:

nealrichardson commented 5 years ago

I'll just briefly comment that this all sounds good in principle and I look forward to re-reviewing in detail, though I won't be able to get to that until early next week.

adamhsparks commented 5 years ago

Thanks, @potterzot. As I said, it was a quick glance. Echoing what @nealrichardson said, I need to fully re-review everything. Those were just the few things I found quickly so I commented.

lmullen commented 5 years ago

Thanks for the detailed changes and response, @potterzot.

@adamhsparks and @nealrichardson Thanks for looking over the changes. Could you please complete your re-review and either request additional changes or vote to approve the package by August 8?

adamhsparks commented 5 years ago

Will do!

-- Dr Adam H. Sparks Associate Professor of Field Crops Pathology  |  Centre for Crop Health Institute for Life Sciences and the Environment (ILSE)  |  Research and Innovation Division University of Southern Queensland | Toowoomba, Queensland | 4350 | Australia

Phone: +61746311948 Mobile: +61415489422 Mobile: +61415489422 On 2 Aug 2019, 5:38 AM +1000, Lincoln Mullen notifications@github.com, wrote:

Thanks for the detailed changes and response, @potterzot. @adamhsparks and @nealrichardson Thanks for looking over the changes. Could you please complete your re-review and either request additional changes or vote to approve the package by August 8? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

nealrichardson commented 5 years ago

Nice work. This looks much improved. I found a few sylistic issues again, and I had some suggestions for how to improve the testing, but rather than write them here, I've made a pull request with them for you to review/merge. The other reason I implemented these suggestions myself was that I got a test failure locally because one of the tests required auth but did not have the appropriate skip_if_no_auth() check, so I was already in the code to debug that. There was also a R CMD check issue I encountered because the .Rbuildignore still wasn't correctly excluding previously built tarballs. All fixed by that PR.

Test coverage could be better, though my PR bumps it up to 91%. Happy to advise on covering the conditions that currently are missed that if you want, though I won't withhold approval based on not reaching 100% line coverate.

One last followup point:

In nassqs_parse RE: "L153: when is not a response object?", at times the API is not working, so in that case this returns the error message directly.

httr::GET() will either return a response object (potentially with an error status, which you will have already have handled before getting here because you pass through the nassqs_check() function) or GET() will itself error (like if your internet is down). I don't think it's possible for it to return anything different.

adamhsparks commented 5 years ago

I have a few last minor points (nitpicks?) that if changed will improve the package. Overall it's greatly improved and I like how it works. Congrats!

1.* @lmullen already noted this much earlier in the process. Please remove the "Date" field from the DESCRIPTION file. CRAN will automatically assign this and it's prone to ending up being out of date if you rely upon updating it manually.

  1. You don't need a paste() in a stop(), e.g., stop(paste0("Your query parameters include 'format' as ", format, " but it should be one of 'json', 'xml', or 'csv'.")). It should be written as: stop("Your query parameters include 'format' as ", format, " but it should be one of 'json', 'xml', or 'csv'.") There are several instances of this that I noted.

3.* "inst/examples/example_parameters.R" has an incomplete final line. Add a line return to the file at the end of the file to fix this.

  1. The documentation for nasqss_GET() is inconsistent in how it references functions in the description. Some of the other R functions discussed in that paragraph use the function() convention while referring to nasqss_GET() is only as nassqss_GET minus the (). As a user I find it more clear if the () is used in documentation to indicate a function, not a parameter is being discussed. Note I didn't check all documentation, I just noticed this here.

5.* The documentation example for nassqss_param_values has "YIElD" not "YIELD" in the comment section, is this correct?

6.* nassqss_parse() documentation Description field is missing a "'" prior to (Z).

7.* I'm not sure that here needs to be listed in the DESCRIPTION Suggests field. It's only used in data-raw as far as I can tell? If so, that folder is not included in the R package so shouldn't need to be specified here.

8.* In the data-raw/get_test_data.R file, it might be good to set the version to 2 for maximum compatibility in the near term with versions of R from 1.4.0 to current. If it's NULL it will default to version 3.

  1. The README code could be formatted a bit more nicely too using proper RMarkdown chunks, e.g.,

    ```{r eval=FALSE}
    # Via devtools
    library(devtools)
    install_github('potterzot/rnassqs')
    
    # Via CRAN
    install.packages("rnassqs")```
  2. The README may not need to be a .Rmd? I can't see that you have any executed R code so you could simplify and just use a .md file.

  3. Consider using codemetar::write_codemeta() to create and update a .json metadata file for the package?

Once these are addressed (at your discretion for many of them) I'm happy to recommend accepting. I've added a "*" after the number and before the comment for the items that I think must be fixed. Those without are at your discretion.

potterzot commented 5 years ago

@adamhsparks Thank you for the incredible detail in this! Much appreciated. I've made all of the changes you suggest except for this one, which I'm unclear on:

8.* In the data-raw/get_test_data.R file, it might be good to set the version to 2 for maximum compatibility in the near term with versions of R from 1.4.0 to current. If it's NULL it will default to version 3.

What do you mean by setting the version? Do you mean setting the R version in DESCRIPTION?

potterzot commented 5 years ago

@nealrichardson I've reviewed and merged your PR, thanks! There were two tests for error handling that were failing:

Because they were within the with_mock_api() block they were returning a GET object instead of the error. I moved these to the authorization block and they work.

Regarding .Rbuildignore excluding tarballs, I had included that at some point long ago, but removed it for a reason that I don't remember. Thank you for adding it.

potterzot commented 5 years ago

@nealrichardson PS if I have your okay I've also added you as a contributor in DESCRIPTION.

nealrichardson commented 5 years ago

Where did you see the failure? The PR merge commit passed on Travis. They don't require auth because they use the mock responses I added here: https://github.com/potterzot/rnassqs/pull/15/files#diff-7c5a672790a8227968bfd57c3a71faa0 Did you possibly make other changes that altered the querystring in the request? That would change the request URL and thus change the mock file path it was looking for. If so, you can rename those mock files to match and they'll be fine.

Sure, happy to be listed at contributor.

potterzot commented 5 years ago

@nealrichardson Hmm, I checked out your commit again and am having no trouble. I must have changed something that started giving me those errors. I returned those files to their original state in a new commit. Also removed the response check based on your note about nassqs_GET() always returning a response object.

adamhsparks commented 5 years ago

Hi @potterzot, The RDS version of the file is 1, 2 or (default) 3. This changed with R 3.6 to "3". So many users may not have the ability to read a version 3 RDS file yet if they've not upgraded to R >= 3.6.

See ?saveRDS for more on version

potterzot commented 5 years ago

Ah I see, that script was generally outdated anyway, so thank you for pointing that out! I was also unaware of the breaking change and do a lot of my data storage in RDS so thank you.

adamhsparks commented 5 years ago

@lmullen, I've updated my initial review with the suggestion to accept, ticked the rest of the boxes and updated time spent reviewing.

lmullen commented 5 years ago

@adamhsparks Great, thanks so much.

@nealrichardson Is there anything else outstanding from your perspective?

nealrichardson commented 5 years ago

All good, just checked the boxes.

lmullen commented 5 years ago

Approved! Thanks @potterzot for submitting and making all the requested changes. And thanks, @adamhsparks and @nealrichardson for especially thorough reviews. Much appreciated.

@potterzot here are some to-dos to complete the onboarding process.

Since you are also publishing to JOSS, we need you to do the following.

You can also release a new version to CRAN.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a blog post about your package - either a short introduction to it with one example or a longer post with some narrative about its development or something you learned, and an example of its use. If you are interested, review the instructions, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

potterzot commented 5 years ago

Thank you! This is very exciting. Thank you all for your fantastic help and efforts. @adamhsparks, I am realizing I didn't specifically ask your permission to include you as a reviewer. @lmullen, I've switched the repository over to ropensci and made changes to the readme.

lmullen commented 5 years ago

Ok, you should be an admin on the repository again, @potterzot.

potterzot commented 5 years ago

@stefaniebutland I would be happy to do a blog post, probably realistically not possible until October. I think I could do a longer article discussing why I started developing the package and how it's been helpful in my research and what I've learned in the process, if that seems like a good fit. Happy to do a shorter one as well.

stefaniebutland commented 5 years ago

@potterzot Sounds good! Please submit a draft when you're ready and we can select a publication date at that time.

why I started developing the package and how it's been helpful in my research

If you include a "cool" example (that's not shown elsewhere) this is especially valuable as a way for readers to see how they might use the package.

what I've learned in the process

Always good to share this. Try to choose a couple of key points.

Thanks!

stefaniebutland commented 5 years ago

Hi @potterzot. I'm checking in to let you know I have a blog post slot open for Tues Oct 29 if you still wanted to do a long form post. A shorter tech note is quite appropriate and could be published any time, after my review.

potterzot commented 5 years ago

@stefaniebutland I couldn't make the Oct 29 deadline but have a working draft now, do you have a good future date that would work? The draft is basically done but I can change the template date and file names to match the anticipated date.

Also, I'm not sure where to put images. The template links I can change, but I don't see the corresponding img/blog-images directory in the roweb2 repository.

potterzot commented 5 years ago

@stefaniebutland nevermind on the second part, I figured out the images. It helps if I read the instructions in full!

stefaniebutland commented 5 years ago

For now, please date 2019-11-26. That might change based on submission status of other posts that already have dates assigned.

I admit there are a LOT of instructions ;-)