patentsview - Githubissues

crew102 commented 7 years ago

Summary

What does this package do? (explain in 50 words or less):

This package provides a wrapper around the patentsview api, which serves USPTO patent data that has been disambiguated. There is one main function to search and download data from the API (search_pv) and several other helper functions.

Paste the full DESCRIPTION file inside a code block below:

Package: patentsview
Type: Package
Title: An R Client to the PatentsView API
Version: 0.1.0.9000
Authors@R: person("Christopher", "Baker", email = "chriscrewbaker@gmail.com",
    role = c("aut", "cre"))
Encoding: UTF-8
Description: Provides functions to simplify the PatentsView API
    (http://www.patentsview.org/api/doc.html) query language,
    send GET and POST requests to the API's seven endpoints, and parse the data
    that comes back.
URL: https://github.com/crew102/patentsview
BugReports: https://github.com/crew102/patentsview/issues
License: GPL-2
LazyData: TRUE
Depends:
    R (>= 3.1)
Imports:
    httr,
    jsonlite,
    utils
Suggests:
    knitr,
    rmarkdown,
    testthat
VignetteBuilder: knitr
RoxygenNote: 6.0.1

URL for the package (the development repository, not a stylized html page): master branch on github
Who is the target audience?
Researchers who want to interact with patent data. There are several specific use cases of patent data that have been listed in this thread of the rOpenSci discussion board.
Are there other R packages that accomplish the same thing? If so, what is different about yours? I know of two R packages that one can use to download patent data: lensr and a development version of opsr. lensr is the most similar, as it can be used to download USPTO patent data. I do not know of any other packages that wrap patentsview, which is the only service that provides USPTO data that has been disambiguated.

Requirements

Confirm each of the following by checking the box. This package:

[X] does not violate the Terms of Service of any service it interacts with.
[X] has a CRAN and OSI accepted license.
[X] contains a README with instructions for installing the development version.
[X] includes documentation with examples for all functions.
[x] contains a vignette with examples of its essential functions and uses.
[X] has a test suite.
[X] has continuous integration with Travis CI and/or another service.

Publication options

[X] Do you intend for this package to go on CRAN?
[ ] Do you wish to automatically submit to the Journal of Open Source Software? If so:
[ ] The package contains a paper.md with a high-level description in the package root or in inst/.
[ ] The package is deposited in a long-term repository with the DOI:
(Do not submit your package separately to JOSS)

Detail

[X] Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

R CMD check passes with one warning regarding this being my first package submission.

[X] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

I believe the package follows the guidelines, but there are a few items I would like to highlight. These may or may not be issues:

The README does not follow the exact same format as described in the packaging guidelines.
I have added a cran-comment.md file which I plan on updating before submitting to CRAN.
I have not tagged the git repo yet, as I would like to wait for feedback before considering this version complete.

If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

maelle commented 7 years ago

Editor checks:

[x ] Fit: The package meets criteria for fit and overlap
[ x] Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
[ x] License: The package has a CRAN or OSI accepted license
[x ] Repository: The repository link resolves correctly
[ ] Archive (JOSS only, may be post-review): The repository DOI resolves correctly
[ ] Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission, @crew102! :grin: I'm currently looking for reviewers.

In the meantime here is the output of goodpractice::gp() that can help you and the reviewers. Note that reaching 100% code coverage isn't compulsory, the output just encourages you to write more of them.

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 55% of code
    lines are covered by test cases.

    R/check-query.R:8:NA
    R/check-query.R:9:NA
    R/check-query.R:61:NA
    R/flatten-pv-data.R:19:NA
    R/flatten-pv-data.R:20:NA
    ... and 139 more lines

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor
    windows that are about 80 characters wide. Try make your lines shorter than 80
    characters

    R\check-query.R:12:1
    R\check-query.R:20:1
    R\check-query.R:38:1
    R\search-pv.R:57:1
    R\search-pv.R:71:1
    ... and 5 more lines

  ✖ avoid sapply(), it is not type safe. It might return a vector, or a list,
    depending on the input data. Consider using vapply() instead.

    R\flatten-pv-data.R:27:20
    R\flatten-pv-data.R:32:18
    R\print.R:14:8
    R\query-dsl.R:64:3
    R\query-dsl.R:67:3
    ... and 3 more lines

Using devtools::spell_check() I found these typos:

equivlient search_pv.Rd:24,26
paramater search_pv.Rd:13
reccomened get_fields.Rd:18
requirnments search_pv.Rd:26

Reviewers: @expectopatronum @poldham Due date: 2017-05-24

crew102 commented 7 years ago

Thanks for running those commands for me, @maelle. I have:

Fixed spelling mistakes and added line breaks to code lines > 80 chars in length.
Looked over all of my sapply calls and either changed to vapply or made sure that I had specified simplify = FALSE.
Added additional unit tests.

The newest version of master is available here: crew102/patentsview@e6fdbc84fac6188e3ff1a2619c157199248e56eb

maelle commented 7 years ago

You're welcome @crew102! And nice work! 😺

maelle commented 7 years ago

Thanks @expectopatronum @poldham for accepting to review this package!

poldham commented 7 years ago

@crew102 @expectopatronum @maelle Just a quick note that I am working on the review at the moment, so @crew102 don't worry that we have forgotten!

expectopatronum commented 7 years ago

Me too! So far I have one remark: I think that in the method validate_endpoint in utils.R the error message

"endpoint must be a length 1 vector and be one of: "

might be confusing. I'm not sure if for an R novice it is clear that one element is equivalent to a vector of length one. Maybe the error message can changed to something like "endpoint must be a single element and be one of:"

I hope comments like this are ok here :)

crew102 commented 7 years ago

Agreed that this is confusing. At several points I felt conflicted between precision and clarity (I actually dropped the phrase "vector of length one" in several places, thinking that it may be confusing...I guess I didn't do that here though). I will update this so it is less confusing.

maelle commented 7 years ago

@poldham @expectopatronum maybe useless since you've both been working on this but here is a friendly reminder that your reviews are due on Wednesday. :smile_cat:

expectopatronum commented 7 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions in R help
[x] Examples for all exported functions in R Help that run successfully locally
[ ] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION

Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

[ ] A short summary describing the high-level functionality of the software

[ ] Authors: A list of authors with their affiliations

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.

[ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

[x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 6

Review Comments

I'm very impressed by the code in the package, I think it is very well written and I enjoyed reviewing it! I only have a few remarks/comments.

I installed and ran it successfully on Windows 10 (R-3.4.0) and Mac OS X (R-3.3.2)
From my point of view the package follows most of the rOpenSci packing recommendations and follows all required things, in more detail:
- It seems it does not generate the README.md from a README.rmd file and it does not include a code of conduct, however it is only a recommendation. I just wanted to point it out in case the author intends to change that.
- It has the mandatory NEWS file.
- Uses recommended packages (e.g. jsonlite)

Possibilities for improvement

Testing is done for the main function search_pv and for error handling, which I think is very important. At first I thought there should be unit tests covering every exported function but after testing myself I think that it would be sufficient to add (a) test(s) for flatten_pv_data().
Confusing error message in validate_endpoint already mentioned in my comment
For convenience (since there is also get_fields()) a function get_endpoints() would be nice
I tried to build my own queries and in one of them I tried to use location_longitude as a field for querying. I got an error location_longitude is not a valid field to query Error in if (f1$data_type == "date" && !is_date(value)) paste0_msg("Bad date: ", : missing value where TRUE/FALSE needed but I could not find out (using the help and the vignette) which fields are allowed for querying. Unless I missed something this should be added to the vignette/manual.

crew102 commented 7 years ago

Thanks a lot for your review, @expectopatronum. I will get working on integrating your feedback soon.

maelle commented 7 years ago

Many thanks @expectopatronum ! :smile_cat:

crew102 commented 7 years ago

Hi @expectopatronum -

In addition to the notes/changes shown below, I have made the following three updates to patentsview:

A new function called get_ok_pk() was added (https://github.com/crew102/patentsview/commit/0c4bbb91e24626d1ec82c71507066d9030dce3a4). get_ok_pk() suggests an appropriate primary key variable to use for the pk_var argument to flatten_pv_data().
The pk_var argument in flatten_pv_data() was renamed to pk (https://github.com/crew102/patentsview/commit/f363c7f83fb7ce3d67df6594d9a2aa6979655649).
flatten_pv_data() was renamed to unnest_pv_data() (https://github.com/crew102/patentsview/commit/b9efa3d00b3e27522454f93b22bd9ec07a6e761e).

It seems it does not generate the README.md from a README.rmd file

The markdown variant of README is actually generated from the rmarkdown variant, it just happens in a sort of atypical way. You can see the README.md target in the Makefile for details (https://github.com/crew102/patentsview/blob/master/Makefile#L8-L11).

Testing is done for the main function search_pv and for error handling, which I think is very important. At first I thought there should be unit tests covering every exported function but after testing myself I think that it would be sufficient to add (a) test(s) for flatten_pv_data().

I have added a unit test for unnest_pv_data() (https://github.com/crew102/patentsview/commit/b9a20a1080c68106fad64b1588aef0b0c540a01c).

Confusing error message in validate_endpoint already mentioned in my comment

I have changed the wording so it is not so confusing (https://github.com/crew102/patentsview/commit/ed08c807e9459db98fdaa23e74cdbf9b2506c3dd).

For convenience (since there is also get_fields()) a function get_endpoints() would be nice

Good idea...I have added get_endpoints() (https://github.com/crew102/patentsview/commit/ee3f65e9f32db288e913b6180473147aaa9699a8).

I tried to build my own queries and in one of them I tried to use location_longitude as a field for querying. I got an error location_longitude is not a valid field to query Error in if (f1$data_type == "date" && !is_date(value)) paste0_msg("Bad date: ", : missing value where TRUE/FALSE needed but I could not find out (using the help and the vignette) which fields are allowed for querying. Unless I missed something this should be added to the vignette/manual.

There are some fields that you are not allowed to include in your query but can include in your fields argument (i.e., you can retrieve data about a field, but you can't filter on that field in your query). get_fields() returns all the fields that you can retrieve for a given endpoint, but some of those fields may not be valid to filter on in your query. It looks like location_longitude is actually not a queryable field, as per the locations endpoint fields list. Was the distinction between a queryable field and a retrievable field not clear from the documentation? If no, I will clarify.

but I could not find out (using the help and the vignette) which fields are allowed for querying. Unless I missed something this should be added to the vignette/manual.

Looks like this is not in the documentation anywhere. I will add it.

maelle commented 7 years ago

@poldham friendly reminder that your review was due last week. Do you think you'll soon have time to post it? Thanks. 😸

@expectopatronum @crew102 nice discussion&work!

crew102 commented 7 years ago

I tried to build my own queries and in one of them I tried to use location_longitude as a field for querying. I got an error location_longitude is not a valid field to query Error in if (f1$data_type == "date" && !is_date(value)) paste0_msg("Bad date: ", : missing value where TRUE/FALSE needed but I could not find out (using the help and the vignette) which fields are allowed for querying. Unless I missed something this should be added to the vignette/manual.

@expectopatronum, I have made the distinction between queryable vs retrievable fields clearer in the README (https://github.com/crew102/patentsview/commit/070a24186bebb3d0adc3a13ef38e4f775d6df313).

maelle commented 7 years ago

@crew102 btw regarding the README an issue with it is that users installing the package from CRAN do not get to see it, so it might be good to have all important content in a vignette (as well / instead).

poldham commented 7 years ago

Dear @crew102,

Apologies for the delay... I was unexpectedly asked to do an overseas presentation and that took up a lot of time.

Before I get into the reviewing part, let me say that it is fantastic that this package has been prepared for R as it could be an incredible resource. Most of my comments will be on thinking about the user experience for those who are R users and who may or may not know much about patent data. The comments are all intended to be constructive. In a discussion last week with colleagues at WIPO we already plan to start using the package in the analytics trainings around the world this year. But, as you will see below, I have some comments on the user experience.

Code Comments

My main code comment is on the requirement for users to write using qry_funs$gt(). The storage of the functions in a list as suggested by HW in Advanced R is really great but it does seem to get in the way of useability... since what the funs are is not clearly explained in the documentation. For example gt and gte must be blindingly obvious (get?) but I confess I remain mystified... and am fully expecting a duh moment.

One possiblility would be to try to mask the qry_funs$gt as simply gt. I found a discussion of options for making lists of functions less verbose at the end of this section of Advanced R. It may be that @expectopatronum @maelle have some thoughts about this.

If it is not possible to find a way forward on that at the moment could I suggest that the list funs are clearly documented in qry_funs().

CMD Check

I receive a warning:

checking package vignettes in ‘inst/doc’ ... WARNING Package vignettes without corresponding PDF/HTML: ‘patentsview.Rmd’ ‘writing-queries.Rmd’

The tests ran OK

Vignettes

I think the package merits some work on the vignettes to make it easier for the first time user.

In both vignettes we see a description of moving from JSON to the DSL as the preferred approach. As the users will be coming in from R I think it makes sense to start with the DSL and maybe explain later on that there are other ways of doing this. I think (as an R user and sometime coder) that being confronted with

search_pv(query = '{"_gt":{"patent_year":2010}}', method = "POST", fields = "patent_number", sort = c("patent_number" = "asc"))

is a bit off putting in term of engaging with the package and is actually, using the favoured DSL, not necessary.

There are of course a ton of things one could do with the package but could I suggest that the vignettes and examples focus on some of the main types of searches that patent researchers will normally do. Those would be:

Keyword and phrase searches
Year restricted searches (publication year and priority year)
Applicant (assignee) and inventor searches
IPC or CPC searches
Citations
Combinations of the above

In addition the locations endpoint is really really exciting (and I suspect folks will jump all over it with Leaflet and so on), but I found it difficult to get going with that.

In terms of the vignettes it is of course up to you. On the patent side of things my own efforts to be straightforward are illustrated for lensr and opsr. Those were mainly written as part of the WIPO analytics training course for complete beginners. That may give you some ideas. In addition, rOpenSci packages such as rplos have good walkthroughs. @maelle monkeylearn walkthrough was really helpful for a newbiew as an other example. I think that makes all the difference in helping people to engage with the package.

Getting to Grips with the Fields

I have seen the good addition of get_endpoints() but I also spent a lot of time trying to figure out the fields and there are a lot. I discovered the fieldsdf.csv in the data-raw folder and would suggest that you might put that as a .rda in data so that the user can call it to work out where they can go.

In the documentation for get_fields() I would suggest

Under endpoint. Bullet list the 7 endpoints to make clear what they are and cross link to get_endpoints().
Under groups. Bullet the groups so they are easier to appreciate.
Note that the location group appears in data-raw/fieldsdf(see below) but was not listed in groups documentation or in the online table
For consistency, note that the online table and fieldsdf use the term group whereas the arg is groups. It is a trivial point but may be worth harmonising.
fieldsdf in data-raw does not contain the description field. Unless there is a good reason could this be added back in to that table to assist with understanding the fields. I can see that there is a case that it is hard to read a long text string in a df in R. But then users could also be given direct access to fieldsdf.

Inventors and Assignees.

Can I suggest that an explanation is needed somewhere that these are cleaned up data fields. I am not sure if there is a published account of the clean up algorithm yet but I saw the video. Not vital right now but important to know that the data has been cleaned and that the raw is also available for purists.

Those are my main comments at the moment. I intend to take another look as I know there have already been some changes but wanted to be sure you received these given the unexpected delay.

crew102 commented 7 years ago

Hi @maelle , the README is actually the "patentsview" vignette (https://github.com/crew102/patentsview/blob/master/vignettes/patentsview.Rmd). The makefile renders the vignette to README.md and also to html (placing the necessary files in inst/doc). I know it's a little wonky how it happens, but the vignette still ends up in the right place (as far as I can tell).

I just realized that I had inst/ in my .gitignore, hence @maelle and others would not have seen it. I will check it in now.

crew102 commented 7 years ago

Hi @poldham , thanks a lot for taking the time to review the package. I will review your comments and post a reply soon.

maelle commented 7 years ago

@poldham many thanks for your review! How many hours did you approximately spend reviewing the package?

Even better to see you mention my monkeylearn package 😉 it benefited from a review in its time!

@crew102 perfect regarding the README then.

poldham commented 7 years ago

A pleasure @crew102 there is so much in this package that I think one of the challenges is working out strategies for getting started.

@maelle somewhere around 6 hours as I did a long stint, got dragged away, and then another stint today.

sckott commented 7 years ago

@poldham thx for your review - can you give an estimate of number of hrs spent reviewing?

maelle commented 7 years ago

@sckott I asked @poldham the same question and he answered right above your comment :wink: but good to gather these data. :-)

sckott commented 7 years ago

woopsy, sorry @poldham , thanks @maelle

crew102 commented 7 years ago

Thanks again @poldham for reviewing the package. My responses to your comments are shown below.

qry_funs

It sounds like you have two issues with the DSL - first, the fact that the functions are stored in a list and thus can be verbose to use, and second, the lack of documentation on the functions themselves.

verbosity. I think it makes sense to keep them in a list due to the possibility of name conflicts. However, I agree that requiring qry_funs$ can get in the way of usability when your query is complex (for example, https://github.com/crew102/patentsview/blob/master/man/with_qfuns.Rd#L21-L29). I included the with_qfuns function in the hopes of reducing that problem. If you feel like this may not be sufficient for some users, I could add a note somewhere in the documentation letting users know that they can assign the functions into their global environment (or some other relevant environment) using list2env(qry_funs, envir = globalenv()).
documentation. I see how someone may not understand the meaning of the functions, as I did not explicitly provide definitions for all of them. I was hoping to just refer people to the patentsview API documentation in the "writing-queries' vignette for these details, but it looks like I didn't get the link exactly correct for this. I have corrected the link and added explicit documentation for each function in the qry_funs() documentation (crew102/patentsview@1b474665)

R CMD Check

I receive a warning: checking package vignettes in ‘inst/doc’ ... WARNING Package vignettes without corresponding PDF/HTML: ‘patentsview.Rmd’ ‘writing-queries.Rmd’

This should be fixed now.

Vignettes

As the users will be coming in from R I think it makes sense to start with the DSL and maybe explain later on that there are other ways of doing this.

I agree that the DSL may be more natural for R users. However, I think the first example of search_pv in the README/"patentsview vignette" should use the JSON version of the query instead of the DSL version (i.e., '{"_gte":{"patent_date":"2007-01-01"}}' instead of qry_funs$gte(patent_date = "2007-01-01")). I think the DSL deserves immediate explanation, whereas a JSON string does not, and hence it would be less of an "information overload" if the first example used JSON.

There are of course a ton of things one could do with the package but could I suggest that the vignettes and examples focus on some of the main types of searches that patent researchers will normally do. Those would be:

I have added a new example of searching for patents belonging to a given assignee (crew102/patentsview@8cf0c1dd). I think the REAMDE now has all of the example types that you mentioned, but let me know if you think a few more would be helpful.

Getting to Grips with the Fields

I have seen the good addition of get_endpoints() but I also spent alot of time trying to figure out the fields and there are a lot. I discovered the fieldsdf.csv in the data-raw folder and would suggest that you might put that as a .rda in data so that the user can call it to work out where they can go.

Similar to the case of the query functions, I was hoping to just refer the interested reader to the online field tables (e.g., in the README and search_pv docs). I did actually include the fieldsdf data frame as a data object for internal use (it was in R/sysdata.rda in the source code), so you could have accessed it with patentsview:::fieldsdf. The reason why I didn't export or document this was that I felt it would be redundant with the existing patentsview documentation. Anyway, I have:

Exported fieldsdf so you should be able to access it with patentsview::fieldsdf or data("fieldsdf") (crew102/patentsview@7b86056f).
Updated the README/patentsview vignette to include a section on fields. The availability of fieldsdf is noted in this section (crew102/patentsview@3bde26aa).

In the documentation for get_fields() I would suggest Under endpoint. Bullet list the 7 endpoints to make clear what they are and cross link to get_endpoints().

I have linked to get_endpoints() in the get_fields() documentation (crew102/patentsview@c7647206). I think I would prefer to keep the endpoints and groups shown inline as they are now, as a list would be fairly long.

Note that the location group appears in data-raw/fieldsdf(see below) but was not listed in groups documentation or in the online table

The location group does not exist in the patents endpoint, so that's probably why you're not seeing it online (assuming you are just looking at the patents table). It exists for locations, inventors, and assignees endpoints (e.g., http://www.patentsview.org/api/location.html#field_list)

For consistency, note that the online table and fieldsdf use the term group whereas the arg is groups. It is a trivial point but may be worth harmonising.

I typically like to use plurals for parameters which the user can specify multiple choices for. For example, endpoint is not plural b/c you can only specify one endpoint, but fields is plural b/c you can specify several fields.

fieldsdf in data-raw does not contain the description field. Unless there is a good reason could this be added back in to that table to assist with understanding the fields. I can see that there is a case that it is hard to read a long text string in a df in R. But then users could also be given direct access to fieldsdf.

I have added "common name" and "description" to fieldsdf.

Inventors and Assignees.

Can I suggest that an explanation is needed somewhere that these are cleaned up data fields. I am not sure if there is a published account of the clean up algorithm yet but I saw the video. Not vital right now but important to know that the data has been cleaned and that the raw is also available for purists.

Yes, it is very important that people realize that they are not getting raw data by default. I have added additional explanation about this in the README/patentsview vignette (crew102/patentsview@938d3167).

maelle commented 7 years ago

@expectopatronum @poldham are you happy with the changes/discussion made by @crew102 ? Thanks again for your reviews, and great work @crew102

expectopatronum commented 7 years ago

Yes absolutely, great work @crew102 and thanks for the great discussions. I really appreciate the clarification in the README and the additional test. @maelle Now I check the checkbox next to "The author has responded to my review and made changes to my satisfaction. I recommend approving this package." and that's it? Or are there any other steps?

maelle commented 7 years ago

@expectopatronum yep that's it, thanks a bunch!

expectopatronum commented 7 years ago

@maelle then it's done! Thanks for inviting me to review, it was a pleasure!

crew102 commented 7 years ago

Thanks a lot @expectopatronum for your review and suggestions

crew102 commented 7 years ago

Hi @maelle , I would like to submit patentsview to CRAN. Is the review process for ROpenSci over, or are we still waiting on @poldham to approve my changes?

maelle commented 7 years ago

Yes I'd prefer @poodham togive feedback and to have a look at it again next week 😊

crew102 commented 7 years ago

OK, I will wait for @poldham

poldham commented 7 years ago

@crew102 A somewhat mad week here in the UK and getting madder.

Many thanks for your responses on the review and here are my observations.

qry_funs

On the qry_funs. I like the list approach with the only issue being verbosity. On with_qfuns, I think my issue was wouldn't it generally make more sense to use with_qfuns in the first place? I am not going to press on this but think the note on assigning could be helpful.
The update to the qry_funs documentation is exactly what was needed.

CMD Check

The warning is fixed but a note is now appearing as:

Status: 1 NOTE checking R code for possible problems ... NOTE check_query: no visible binding for global variable ‘fieldsdf’ get_fields: no visible binding for global variable ‘fieldsdf’ validate_args: no visible binding for global variable ‘fieldsdf’ Undefined global functions or variables: fieldsdf

A bit puzzling because the documentation in data.R looks OK to me.

While on data.R could I suggest adding in the URL to the online version of fieldsdf.

Vignettes

As in my earlier comments I think starting with the JSON and moving the the DSL could be confusing and off-putting... notably for a new R user. As in, whoah... why am I being asked to input JSON strings into an R function... does not compute from the Coursera/Datacamp course I'm doing...get me out of here. I mention this also because the survey of patent data analytics practices we ran for WIPO revealed that most people are still using Excel... sooo... However, a potential solution to this would be to add an explanatory comment at the beginning of the README under Basic Usage roughly to the effect that:

"The PatentsView API provides an interface to a disambiguated version of USPTO patent data. The PatentsView API uses JSON strings as described on the PatentsView webpage. In the R package you can use JSON strings, as in the examples below, or you can use the simpler domain specific language (DSL) described below to make writing queries in R easier."

That would be my suggestion on this but I leave the precise wording to you. I do however think there needs to be a bit more explanatory lead in.

On the README examples, I think there are now a nice range of examples. The Fields discussion is good with the links to fieldsdf. The distinction between queryable and retrievable helps a lot.
I suggest another quick spell check on the README e.g. "aplied to"
On list columns in the results, that will be very helpful to users unfamiliar with list columns (and from my work on EPO data patent packages will probably always involve them). At the very end of the README on joining on the primary key you may wish to consider referencing see dplyr::join() for guidance on how to do this. Chapter 13 of HW & GGs R for Data Science here also provides a very good practical walkthrough on joins in R.
On the link established between get_fields() and get_endpoints(). I can see the length issue on groups. I would propose to bullet the endpoints in get_fields for easy reference and leave the groups inline. I propose this because a user needs to understand what the endpoints are to be able to get their head around the groups.
On the location issue. Thanks and that is now clear to me. I will leave it with you on whether to document that clarification or not.
Ok on groups term in args and good explanation.
fieldsdf really is a great help.
On raw vs. disambiguated data this is now very clear. You might consider adding the reference to the disambiguation process e.g. For those who might be interested in how the disambiguation was done by the team led by Nicholas Monath and Andrew McCallum from the University of Massachusetts Amherst the pdf is here and details of the 2015 PatentsView Inventor Disambiguation Technical Workshop on approaches to disambiguation are here.

Subject to the above minor points @maelle @expectopatronum I am very happy.

@maelle please add another 1.5 hours to my review time.

maelle commented 7 years ago

@poldham thanks a lot! 😀

crew102 commented 7 years ago

Hi @expectopatronum , thanks for the feedback. Below you will find my responses. @maelle , where do we go from here? Should I get the package ready for submission to CRAN and just submit myself?

On with_qfuns, I think my issue was wouldn't it generally make more sense to use with_qfuns in the first place? I am not going to press on this but think the note on assigning could be helpful.

You can use with_qfuns exclusively, but I think it makes sense to explain to users all of the possibilities and let them decide. I have added a note regarding how users can assign the functions into their global environment (crew102/patentsview@5e95770).

The warning is fixed but a note is now appearing as: Status: 1 NOTE checking R code for possible problems ... NOTE check_query: no visible binding for global variable ‘fieldsdf’ get_fields: no visible binding for global variable ‘fieldsdf’ validate_args: no visible binding for global variable ‘fieldsdf’ Undefined global functions or variables: fieldsdf

This should be fixed now (crew102/patentsview@e1a2f257).

While on data.R could I suggest adding in the URL to the online version of fieldsdf.

I have a URL to one of the endpoint's online field tables (crew102/patentsview@2e4ad0c)

However, a potential solution to this would be to add an explanatory comment at the beginning of the README under Basic Usage roughly to the effect that:

I have added a note to the README (crew102/patentsview@e25f222f05).

I suggest another quick spell check on the README e.g. "aplied to"

I have fixed this mistake and checked spelling again (crew102/patentsview@da220a5290).

On list columns in the results, that will be very helpful to users unfamiliar with list columns (and from my work on EPO data patent packages will probably always involve them). At the very end of the README on joining on the primary key you may wish to consider referencing see dplyr::join() for guidance on how to do this. Chapter 13 of HW & GGs R for Data Science here also provides a very good practical walkthrough on joins in R.

I have referenced the Relational data chapter in the documentation for unnest_pv_data (crew102/patentsview@fa9a406df).

On the link established between get_fields() and get_endpoints(). I can see the length issue on groups. I would propose to bullet the endpoints in get_fields for easy reference and leave the groups inline. I propose this because a user needs to understand what the endpoints are to be able to get their head around the groups.

I have added a bulleted list with the endpoints to the get_endpoints() documentation (crew102/patentsview@e4a140ac6). I think the endpoints should be listed here instead of at get_fields(). A link to get_endpoints() is given in the documentation for get_fields().

You might consider adding the reference to the disambiguation process

I think that referencing the PatentsView online documentation should suffice for this.

maelle commented 7 years ago

@poldham are you happy with this response?

@crew102 I'll do the last checks later this week, thanks for your patience! 😊 After the last checks You'll transfer your repo to ropensci therefore the URLs in DESCRIPTION will change so please wait before the CRAN submission. 😉

crew102 commented 7 years ago

OK, no problem. I have to do some final checks as well. I'll also be updating cran-comments.md. I'll take care of those items once the review process is officially over.

poldham commented 7 years ago

@crew102 @maelle. Thanks @crew102. This is all fine with me except that I still think that the opening sentence of the README as raised under point 1 in Vignettes above still needs a sentence or so to better orient the reader that a) they are dealing with a JSON API and b) that there is an easy to use DSL.

On the disambiguation point I am assuming that you mean http://www.patentsview.org/workshop/ which is fine with me. I think the reference to the disambiguation method process is important to give users confidence in using the disambiguated data as inventor disambiguation has been an intractable and non-trivial problem for many many years.

Looking forward to seeing this join the ropensci collection and to spreading the word and using it. Many congratulations!

crew102 commented 7 years ago

Hi @poldham , I added a link to the disambiguation workshop (crew102/patentsview@9b137e28). In my mind the documentation surrounding the DSL is complete...As per your request, the README references the DSL in the first section. I believe that any commentary beyond that would be calling too much attention to the DSL early on, and introduce too many new concepts at one point (which I tried to avoid by breaking it up into various sections). Users who want more info regarding the DSL are referred to the vignette, which should address your issue that the JSON will scare off novice users. I have added a small note that lets users know that the query language uses JSON (crew102/patentsview@bc0ef6a7).

maelle commented 7 years ago

@crew102 approved! :smile_cat:

I've added you to the ropensci organization, could you transfer the repository? After that you'll have to change all URLs and I'll activate continuous integration via rOpenSci account (if not today, then tomorrow).

Besides, please add the rOpenSci footer [![ropensci_footer](http://ropensci.org/public_images/github_footer.png)](http://ropensci.org) to the bottom of the README.

crew102 commented 7 years ago

Hi @maelle , great news! Thanks again to you and the two reviewers, @poldham and @expectopatronum. I made small edits to the documentation which I have yet to push. I will push those, add the footer, trasfer the repo, change the urls, and then you can activate CI with your account. After that is finished, can I submit to CRAN?

maelle commented 7 years ago

Yes!

crew102 commented 7 years ago

rmarkdown::render was having a hard time finding http://ropensci.org/public_images/github_footer.png (not sure why, though...Any idea what ), which was causing my makefile to fail when I added the footer to patentsview.rmd (which is the readme). I figured it'd just append the raw readme.md file with the footer text like so: https://github.com/crew102/patentsview/blob/master/Makefile#L12 It works but you have to run the makefile in order to refresh the readme.me. Is that going to be a problem for ropensci? Running the readme will fail if you don't have Rscript on your path.

crew102 commented 7 years ago

Also, does ropensci have an AppVeyor account? If so, can we run builds on AppVeyor as well as Travis?

maelle commented 7 years ago

Oops sorry yes this is a current pandoc bug so you can wait until the pandoc bug is fixed to add itn (will tell you)

Yes I'll activate your repo on appveyor as well😊

crew102 commented 7 years ago

Can you confirm that I have permission to create repos on ropensci? Github thinks I don't have permission.

maelle commented 7 years ago

@crew102 you mean that when you tried transferring the repo, it didn't work?

maelle commented 7 years ago

Have you received my invitation to the patentsview team in the rOpenSci organization @crew102 ?

crew102 commented 7 years ago

Hi @maelle , I have transferred the repo and it's asking me "Please select any teams you wish to have access to ropensci/patentsview." I chose the "patentsview" group. I hope this is correct?

maelle commented 7 years ago

@crew102 just check the patentsview team.

Thanks @sckott for guidance (@crew102 disclaimer: your package is the first one I've ever accepted, so thanks for your patience)

ropensci / software-review

patentsview #112

Summary

Requirements

Publication options

Detail

Editor checks:

Editor comments

Package Review

Documentation

Paper (for packages co-submitting to JOSS)

Functionality

Final approval (post-review)

Review Comments

Code Comments

CMD Check

Vignettes

Getting to Grips with the Fields

Inventors and Assignees.

qry_funs

R CMD Check

Vignettes

Getting to Grips with the Fields

Inventors and Assignees.

qry_funs

CMD Check

Vignettes