npi: Access the U.S. National Provider Identifier Registry API

frankfarach commented 2 years ago

Date accepted: 2022-11-08 Reviewers: Submitting Author Name: Frank Farach Submitting Author Github Handle: !--author1-->@frankfarach@parmsam@maelle<!--end-editor-- Reviewers: @Rekyt, @zabore

Due date for @Rekyt: 2022-04-04 Due date for @zabore: 2022-04-18

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: npi
Title: Access the U.S. National Provider Identifier Registry API
Version: 0.1.0
Authors@R: c(
    person("Frank", "Farach", , "frank.farach@gmail.com", role = c("cre", "aut"),
           comment = c(ORCID = "0000-0002-2145-0145")),
    person("Sam", "Parmar", , "parmartsam@gmail.com", role = "ctb")
  )
Description: Access the United States National Provider Identifier
    Registry API (if available) and provide informative error messages
    when it's not.
License: MIT + file LICENSE
URL: https://github.com/frankfarach/npi,
    https://frankfarach.github.io/npi/,
    https://npiregistry.cms.hhs.gov/api/
BugReports: https://github.com/frankfarach/npi/issues
Depends: 
    R (>= 3.1)
Imports: 
    checkLuhn,
    dplyr,
    glue,
    httr,
    magrittr,
    purrr,
    rlang,
    stringr,
    tibble,
    tidyr,
    utils
Suggests: 
    checkmate,
    covr,
    httptest,
    knitr,
    mockery,
    rmarkdown,
    spelling,
    testthat (>= 2.1.0)
VignetteBuilder: 
    knitr
Encoding: UTF-8
Language: en-US
LazyData: true
RoxygenNote: 7.1.2

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [x] data retrieval
- [ ] data extraction
- [ ] data munging
- [ ] data deposition
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences): npi is an API wrapper for accessing and retrieving data from the public U.S. National Provider Identifier Registry.
Who is the target audience and what are scientific applications of this package? Healthcare analysts, public health analysts, health economists, health policy analysts, medical informaticists, and journalists can use data obtained through this package to address a wide range of scientific questions in their respective domains. NPI numbers appear in other publicly available records, making them especially useful for linking datasets (e.g., healthcare quality, Medicare, etc.) in an otherwise fragmented healthcare data landscape.
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category? None that I could find on Metacran when searching for npi or NPPES.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research? Yes. Although the API, and thus the package, returns personally identifiable information (e.g., names of individual providers), the U.S. government already makes these records freely and publicly available online, and provides data dissemination notices on the data elements to be disclosed from NPI records.
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any pkgcheck items which your package is unable to pass. The only failing check is for continuous integration. However, the package repo performs CI integration through Github Actions, including standard R-CMD-check, test coverage, pkgdown, and CITATION.cff.

Technical checks

Confirm each of the following by checking the box.

[x] I have read the guide for authors and rOpenSci packaging guide.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[X] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

ropensci-review-bot commented 2 years ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 2 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 2 years ago

Checks for npi (v0.1.0)

git hash: db497325

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'CITATION' file.
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 93.4%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Package License: MIT + file LICENSE

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 9 files) and - 1 authors - 1 vignette - 1 internal data file - 11 imported packages - 8 exported functions (median 10 lines of code) - 54 non-exported functions in R (median 8 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 9| 55.2| | |files_vignettes | 1| 68.4| | |files_tests | 6| 84.4| | |loc_R | 398| 40.6| | |loc_vignettes | 54| 10.1| | |loc_tests | 258| 60.8| | |num_vignettes | 1| 64.8| | |data_size_total | 3102| 64.9| | |data_size_median | 3102| 71.5| | |n_fns_r | 62| 63.3| | |n_fns_r_exported | 8| 38.3| | |n_fns_r_not_exported | 54| 69.6| | |n_fns_per_file_r | 6| 71.6| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 9| 24.3| | |loc_per_fn_r_exp | 10| 22.2| | |loc_per_fn_r_not_exp | 8| 26.4| | |rel_whitespace_R | 28| 55.8| | |rel_whitespace_vignettes | 48| 18.6| | |rel_whitespace_tests | 31| 67.1| | |doclines_per_fn_exp | 19| 12.2| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 34| 58.2| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/frankfarach/npi/workflows/R-CMD-check/badge.svg)](https://github.com/frankfarach/npi/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:--------------------------|:----------|:------|:----------| |Commands |skipped |db4973 |2022-03-04 | |pages build and deployment |success |552287 |2022-03-04 | |pkgdown |success |5a1981 |2022-03-04 | |R-CMD-check |success |5a1981 |2022-03-04 | |test-coverage |success |5a1981 |2022-03-04 | |Update CITATION.cff |success |5a1981 |2022-03-04 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 93.45 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found no issues with this package!

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.3.88 | |pkgcheck |0.0.2.239 |

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

jooolia commented 2 years ago

Dear @frankfarach, Thank you for your submission. The package appears to be in-scope and in good shape, so I am looking for an editor to handle this submission. Thanks, Julia

frankfarach commented 2 years ago

Hi @jooolia, Thanks for the update. I look forward to working with the peer review team. Frank

jooolia commented 2 years ago

@ropensci-review-bot assign @maelle as editor

ropensci-review-bot commented 2 years ago

Assigned! @maelle is now the editor

maelle commented 2 years ago

:wave: @frankfarach Thanks for your submission! I'll start looking for reviewers, I have just one question and one comment below.

Editor checks:

[x] Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
- [x] Is the case for the package well made?
- [x] Is the reference index page clear (grouped by topic if necessary)?
- [x] Are vignettes readable, sufficiently detailed and not just perfunctory?
[x] Fit: The package meets criteria for fit and overlap.
[x] Installation instructions: Are installation instructions clear enough for human users?
[x] Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
[x] Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
[x License: The package has a CRAN or OSI accepted license.
[x] Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

There aren't many functions but you could still add grouping to the reference index so that the most important functions come first.
I was curious to know why you use mockery::.mockPaths()?

maelle commented 2 years ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 2 years ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/505_status.svg)](https://github.com/ropensci/software-review/issues/505)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

maelle commented 2 years ago

@ropensci-review-bot add @Rekyt to reviewers

ropensci-review-bot commented 2 years ago

@Rekyt added to the reviewers list. Review due date is 2022-04-04. Thanks @Rekyt for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@Rekyt: If you haven't done so, please fill this form for us to update our reviewers records.

frankfarach commented 2 years ago

Hi @maelle, thanks for the comment and question, which I address below.

Editor comments

There aren't many functions but you could still add grouping to the reference index so that the most important functions come first.

Thanks for the suggestion. I have created an issue for this.

I was curious to know why you use mockery::.mockPaths()?

.mockPaths() is from the httptest package. It lets you set a directory other than the current working directory for recording API fixtures with functions like httptest::with_mock_api(). It looks like I might not need .mockPaths().

Your question made me realize that I am loading both httptest and mockery in tests/testthat/setup.R. I think I am only using functions from httptests in my test suite, so I may not need mockery at all. I've created another issue to check on this as part of my response to peer review.

maelle commented 2 years ago

:wave: @frankfarach! Thank you! I realized afterwards .mockPaths() was from httptest, as I was working on a package that uses httptest. :woman_facepalming: (in that package we use with_mock_dir())

maelle commented 2 years ago

@ropensci-review-bot add @zabore to reviewers

ropensci-review-bot commented 2 years ago

@zabore added to the reviewers list. Review due date is 2022-04-08. Thanks @zabore for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@zabore: If you haven't done so, please fill this form for us to update our reviewers records.

maelle commented 2 years ago

@ropensci-review-bot set due date for @zabore to 2022-04-18

ropensci-review-bot commented 2 years ago

Review due date for @zabore is now 18-April-2022

Rekyt commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 3

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Overall the package is very well designed. It has simple and meaningful functions and hides a quite important machinery to make the results of the NPI API easily accessible and exploitable in R. I would like to congratulate the author for this work.

Regarding code efficiency I didn't found any improvement that could be easily made as I have the impression the code is pretty optimized both from a reading and performance standpoint.

Regarding the documentation, I think it's very well written and clear. It's easy to understand it reading from different entry points. The difficulties I had understanding were rather related to the NPI API that I am not familiar with.

I have however a few comments on how to improve the coverage of the package and found several edge cases not well covered by the package, that could be worth to include as additional tests.

Documentation

API URL

You could indicate API URL in DESCRIPTION file, this is needed for submission to CRAN (if you consider it). It is also possible to specify which link is what in the same file. Like it is done in taxize DESCRIPTION file https://github.com/ropensci/taxize/blob/a4db9a790e48c93235cac6145826c097ec687408/DESCRIPTION#L10-L12

npis description

The description of npis dataset doesn't seem to fit the dataset as it specifies "list of 0-n tibbles" for one column. It is not clear to me how to read it. Though I think it's nice to indicate the type of element. Maybe a way to make it more explicit would be to indicate "list-column of tibbles with X rows and Y columns".

extra vignette: use cases

I have no idea how this API could be useful in a real use case. But it may be nice to have an extra vignette showing how you can answer some questions through this API (how is it useful for research? for health administration?).

Function usage

Rate limitation

Is there a form of rate limitation? It seems reasonable given that we access an API. Nothing is shown from the API standpoint nor the package? (I've seen it mentioned in the "Getting started" vignette but it could be more explicit). If it is the case, that could be a nice thing to mention to the user so that they are aware of the time it would take to get their results back.

Can't use npi_summarize() with country code

I've found an issue when querying by country code which returns slightly differently formatted query and as such renders npi_summarize() unusable.

reprex:

library("npi")
ki = npi_search(country_code = "DE")
#> Requesting records 0-10...
npi_summarize(ki)
#> Error:
#> ! Tibble columns must have compatible sizes.
#> * Size 10: Existing data.
#> * Size 12: Column `primary_taxonomy`.
#> i Only values of size one are recycled.
#> Run `rlang::last_error()` to see where the error occurred.

Maybe npi_summarize() should take into account this edge case (even if I suppose there are not that many American health professionals outside the US).

Error when internet is off

Because this is a recurrent theme for API packages (and CRAN asks them to "fail gracefully" in these cases), I've checked what happens when I turn of the internet and I don't think it is that explicit for the user.

> npi_search(city = "San Francisco")
Requesting records 0-10...
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Could not resolve host: npiregistry.cms.hhs.gov

Maybe there could be a way to display a better error message when internet is off or the website unreachable? Like check first hand if internet is off and errors explicitly if it is the case.

Special characters in npi_search() arguments

Character arguments of npi_search() allow for some special character as specified by the documentation. But when searching with other special characters, the query is still submitted.

reprex:

library("npi")
npi_search(first_name = "KOŒ*")
#> Requesting records 0-10...
#> Error in `npi_handle_response()`:
#> ! 
#> Field: first_name
#> Field contains special character(s) or wrong number of characters

^{Created on 2022-04-04 by the reprex package (v2.0.1)}

It works well but it fallback on the API when this is the case, I think it could be possible to catch these issues early when processing these arguments to save time and additional queries.

Wildcards special cases

Same as above, some special queries with wildcards used improperly (not trailing) could be caught earlier. I had strange results with some.

reprex:

library("npi")

# Trying to mess up with wildcards
ko = npi_search(last_name = "M*ll*")
#> Requesting records 0-10...
ko$basic  # The answer has nothing to do with the pattern
#> [[1]]
#> # A tibble: 1 x 11
#>   first_name last_name middle_name credential sole_proprietor gender
#>   <chr>      <chr>     <chr>       <chr>      <chr>           <chr> 
#> 1 JENNY      ENSTROM   E           PA         NO              F     
#> # ... with 5 more variables: enumeration_date <chr>, last_updated <chr>,
#> #   status <chr>, name <chr>, certification_date <chr>

# Wildcards in the middle do not work
ko = npi_search(last_name = "M*ll")
#> Requesting records 0-10...
ko$basic  # And I get the same answer instead of an error?!
#> [[1]]
#> # A tibble: 1 x 11
#>   first_name last_name middle_name credential sole_proprietor gender
#>   <chr>      <chr>     <chr>       <chr>      <chr>           <chr> 
#> 1 JENNY      ENSTROM   E           PA         NO              F     
#> # ... with 5 more variables: enumeration_date <chr>, last_updated <chr>,
#> #   status <chr>, name <chr>, certification_date <chr>

^{Created on 2022-04-04 by the reprex package (v2.0.1)}

User agent

It's great that the package provides a user-agent. It has been proven to be the best way possible to access an API to provide a proof of who is accessing the API.

In the README there is a mention of the user agent (https://github.com/frankfarach/npi/blob/d4e98a52ccc0a7f71328582333e5b4191f4796b9/README.Rmd#L114-L120). Which seems a great idea. However, this does not consider that the user may not be familiar with the concept of user-agent. Also, the displayed example doesn't show in which way it can be helpful to change its user-agent. Maybe you could go to a more concrete example. I also wonder if it is a good idea to let the user entirely remove the reference to the package.

To improve the user-agent, which only points to the URL of the repo, it would be good to indicate the version npi in use like it is done in taxize. If it is needed for the user to modify its user agent, then maybe providing a fixed part of the UA with the version of the package and a customizable second part would be even more complete.

Code

checkmate

checkmate is in Suggests but is used across several functions and no tests are done if package isn't installed. Maybe a good idea to make an Imports?

tidyr warning

At several occasion we face this message:

Warning message:
The `.sep` argument of `unnest()` is deprecated as of tidyr 1.0.0.
Use `names_sep = '_'` instead.

while there is a function to check for installed tidyr version tidyr_new_interface(). On my computer I had the error when using npi_flatten() even though I had tidyr v.1.2.0 installed

Tests

Functions delay_by() and remove_null() are defined but never used (and never tested). Was that planned for another update? Because it artificially reduces the coverage.

No tests have been done with empty results? Certain parameter combinations give back empty results and it could be easy to cover L177 of npi_search.R https://app.codecov.io/gh/frankfarach/npi/blob/master/R/npi_search.R (for ex. npi_search(last_name = "Grenie");))

npi_flatten() is not covered at all and could be covered by developing tests using the npis dataset.

frankfarach commented 2 years ago

Hi @Rekyt,

Thank you for your review of the npi package. I really appreciate your kind words about the package and -- even more than that -- your specific recommendations for improving the user experience and the quality of the code.

I look forward to responding to your review in detail.

frankfarach commented 2 years ago

Hi @maelle, @Rekyt, and @zabore,

As you will see, I have been creating a new issue in the package repo for each substantive item in your reviews. Each issue links back to the originating comment(s) in this review thread and all (and only) such issues are collected in an rOpenSci Review milestone. I plan to link to these issues and any pull requests I make to the package repository in my forthcoming responses.

Although I am doing this primarily to stay organized and preserve the intellectual lineage of future changes to the package, I thought I should explain the appearance of a bunch of issue references in this thread. I also hope it's helpful to you -- but if not, let me know and I'll see if I can remove the backlinks, or at least not link to this thread when @zabore's review arrives.

zabore commented 2 years ago

Here is my review!

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 4

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Overall I found this package to be well designed and well documented. Nice work! I understand what it does, but I was left wondering why. It could be nice to include a vignette demonstrating how you envision this being used. How are people getting this information now, and how will this streamline things for them? Below are some more specific comments, but overall I found no major issues.

README

Overall the README is thorough and well organized and easy to read. Great work!
Under "Search the registry" section the README mentions 4 regular vector columns that are returned by npi_search() but when I execute the code I get enumeration_type in place of the mentioned provider_type

When I ran npi_summarize(nyc) I got the following warning message (Note I have tidyr 1.2.0 installed on my machine):

Warning message:
The `.sep` argument of `unnest()` is deprecated as of tidyr 1.0.0.
Use `names_sep = '_'` instead.

The example in the README is not reproducible, meaning I got different results when I ran the code than what was shown in the README output. Also, the ouput included in the vignette differ from those in the README. I assume this is due to a change in the NPI itself between when the README was created and when I ran the code? If possible, reproducible example in the README is ideal, though I understand that may be a challenge with a changing API.

vignette

In the "Search registry" section I would prefer to see the other search arguments listed by the argument name rather than a description (i.e. to see first_name rather than "provider's first name") as this
More details could be provided about what npi_summarize() does. Instead of saying it provides a more "human-readable" overview you could instead just state explicitly that it extract the first and last name, primary taxonomy, and primary address and phone number for each provider.
The README contains slightly more detailed information about what npi_flatten() does than the vignette. For users familiar with purrr::flatten() the action may be somewhat self explanatory. But for other users the term "flatten" likely has no meaning, so a more functional description could be useful.

Documentation/Examples

Great use of npi_ to start all exported functions, makes them easy to use and get through autofill.
It's not clear to me how or when npi_is_valid() would be used. Could provide some motivation or an actual use case in the README and/or vignette.
In the documentation for the npis dataset you mention in several places "[list of 0-n tibbles]" but there is no definition of n. I assume this indicates the number of tibbles in the list can vary, but I don't think it's necessary to include this. "List of tibbles" could be sufficient.
Is there documentation anywhere for what information is contained in the columns of the tibble returned by npi_search()? I couldn't find it.
The help page (or vignette?) for npi_search() could provide more details for how matching is done by default. For example, I get different results if I search for city = "new york" versus city = "new york city". I know you have not programmed the name matching yourself, but I think this is important information for the user to have so at least pointing to a link where this information can be found could be useful.
It could be useful to add a message from npi_search() that explicitly states something along the lines of "Returning 10 of xxx records for the specified search parameters" so that the user is aware they are getting only a subset of the records that match their search parameters.
Why is 1200 the maximum for the limit argument in npi_search()?

Tests

Look thorough to me.

Functionality/performance

No performance issues or areas for improvement that I can see
Your error messages are quite good, providing useful information about where errors in supplied arguments occurred and what is allowed. This is really user friendly!
It seems like when invalid arguments are passed to npi_search() it still tries to query the API first and then returns an error. Internal checks could be done in this function to avoid the query altogether in these cases and to more quickly return a useful error message.
I see that the taxonomies field in the result of npi_search() has a "primary" column, which I assume is used by npi_summarize() to identify the primary taxonomy. However, how does npi_summarize() determine the "primary" address and phone number when there are more than one listed, since there is no corresponding "primary" column in addresses? Is it always just the first record from taxonomies and address?
Does npi_flatten() always return a sensible result? It seems like it merges together a lot of different pieces of information from different list columns but does it always make sense to cross-tabulate every single one? I don't have an idea of where it might go wrong, but always wonder when merging tibbles with different numbers of rows.

maelle commented 2 years ago

Thanks a ton @Rekyt & @zabore for your thorough reviews! @zabore, would you mind adding the time spent reviewing to your comment after "Estimated hours spent reviewing: "? Thank you!

@frankfarach I think the backlinks & milestone are great. We will still expect your response to be more or less self-explanatory so not just a list of links (I'm not assuming you got this wrong, just stating this just in case!).

maelle commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/505#issuecomment-1087579526 time 3

ropensci-review-bot commented 2 years ago

Logged review for Rekyt (hours: 3)

maelle commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/505#issuecomment-1090531882= time NA

ropensci-review-bot commented 2 years ago

Logged review for zabore (hours: NA)

zabore commented 2 years ago

@maelle I added the time, sorry about the initial oversight!

maelle commented 2 years ago

@zabore thank you! actually it was a great oversight as it allowed us to add the possibility to log a review without a time entered yet. :wink:

maelle commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/505#issuecomment-1090531882= time 4

ropensci-review-bot commented 2 years ago

Logged review for zabore (hours: 4)

frankfarach commented 2 years ago

Thanks a ton @Rekyt & @zabore for your thorough reviews! @zabore, would you mind adding the time spent reviewing to your comment after "Estimated hours spent reviewing: "? Thank you!

Likewise, @Rekyt and @zabore, thanks so much for your reviews! I really appreciate the time you took to review the package and look forward to responding.

@frankfarach I think the backlinks & milestone are great. We will still expect your response to be more or less self-explanatory so not just a list of links (I'm not assuming you got this wrong, just stating this just in case!).

@maelle Understood. I'll submit a self-contained prose response. The links will only be there for convenience. Thanks!

frankfarach commented 2 years ago

Hi @maelle, I wanted to let you know that I'm still making changes and preparing my response. You can see my commits here. I've addressed about 2/3 of the issues raised by reviewers so far. Based on my slow-but-steady progress to date and my available time in the near future, I am aiming to submit my response by May 6. I mention this as I know there's an expectation for package authors to provide their responses within two weeks of the last-submitted review, and we're a few days beyond that period already. Please let me know if my proposed timeline works for you. Thanks for your patience!

maelle commented 2 years ago

Thank you for the update @frankfarach!

maelle commented 2 years ago

@frankfarach you should now be able to add a comment with the contents @ropensci-review-bot check package in order to get a pkgcheck report :-) (it was previously only possible for editors)

maelle commented 2 years ago

@frankfarach any update? Thank you! :smile_cat:

frankfarach commented 2 years ago

@maelle Sorry for my delayed response. Can you please put a 3-month hold on this review? I intend to finish this submission but have other things I need to take care of in the meantime. Thank you for considering this request. 😄

maelle commented 2 years ago

@ropensci-review-bot put on hold

ropensci-review-bot commented 2 years ago

Submission on hold!

maelle commented 2 years ago

Thank you for the update @frankfarach! Done :heavy_check_mark:

maelle commented 2 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 2 years ago

Thanks, about to send the query.

ropensci-review-bot commented 2 years ago

:rocket:

Error: Issue template has no 'repourl'

:wave:

maelle commented 2 years ago

@frankfarach Please don't mind me, I was just checking the bot. :sweat_smile:

ropensci-review-bot commented 2 years ago

@maelle: Please review the holding status

maelle commented 2 years ago

:wave: @frankfarach, any update? :smile_cat:

frankfarach commented 2 years ago

👋 @frankfarach, any update? 😸

The API recently introduced a change in the response type that I need to fix as it prevents all search requests from being parsed. I am hoping to have this resolved in the next week, then I will return to addressing the remaining issues mentioned by the reviewers.

maelle commented 2 years ago

Thank you! I hope the API change won't be too painful to tackle!

ropensci / software-review