ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

npi: Access the U.S. National Provider Identifier Registry API #505

Closed frankfarach closed 1 year ago

frankfarach commented 2 years ago

Date accepted: 2022-11-08 Reviewers: Submitting Author Name: Frank Farach Submitting Author Github Handle: !--author1-->@frankfarach<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) !--author-others-->@parmsam<!--end-author-others-- Repository: https://github.com/frankfarach/npi Version submitted: 0.1.0 Submission type: Standard Editor: !--editor-->@maelle<!--end-editor-- Reviewers: @Rekyt, @zabore

Due date for @Rekyt: 2022-04-04 Due date for @zabore: 2022-04-18

Archive: TBD Version accepted: TBD Language: en

Package: npi
Title: Access the U.S. National Provider Identifier Registry API
Version: 0.1.0
Authors@R: c(
    person("Frank", "Farach", , "frank.farach@gmail.com", role = c("cre", "aut"),
           comment = c(ORCID = "0000-0002-2145-0145")),
    person("Sam", "Parmar", , "parmartsam@gmail.com", role = "ctb")
  )
Description: Access the United States National Provider Identifier
    Registry API (if available) and provide informative error messages
    when it's not.
License: MIT + file LICENSE
URL: https://github.com/frankfarach/npi,
    https://frankfarach.github.io/npi/,
    https://npiregistry.cms.hhs.gov/api/
BugReports: https://github.com/frankfarach/npi/issues
Depends: 
    R (>= 3.1)
Imports: 
    checkLuhn,
    dplyr,
    glue,
    httr,
    magrittr,
    purrr,
    rlang,
    stringr,
    tibble,
    tidyr,
    utils
Suggests: 
    checkmate,
    covr,
    httptest,
    knitr,
    mockery,
    rmarkdown,
    spelling,
    testthat (>= 2.1.0)
VignetteBuilder: 
    knitr
Encoding: UTF-8
Language: en-US
LazyData: true
RoxygenNote: 7.1.2

Scope

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

ropensci-review-bot commented 2 years ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 2 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 2 years ago

Checks for npi (v0.1.0)

git hash: db497325

Package License: MIT + file LICENSE


1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 9 files) and - 1 authors - 1 vignette - 1 internal data file - 11 imported packages - 8 exported functions (median 10 lines of code) - 54 non-exported functions in R (median 8 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 9| 55.2| | |files_vignettes | 1| 68.4| | |files_tests | 6| 84.4| | |loc_R | 398| 40.6| | |loc_vignettes | 54| 10.1| | |loc_tests | 258| 60.8| | |num_vignettes | 1| 64.8| | |data_size_total | 3102| 64.9| | |data_size_median | 3102| 71.5| | |n_fns_r | 62| 63.3| | |n_fns_r_exported | 8| 38.3| | |n_fns_r_not_exported | 54| 69.6| | |n_fns_per_file_r | 6| 71.6| | |num_params_per_fn | 2| 11.9| | |loc_per_fn_r | 9| 24.3| | |loc_per_fn_r_exp | 10| 22.2| | |loc_per_fn_r_not_exp | 8| 26.4| | |rel_whitespace_R | 28| 55.8| | |rel_whitespace_vignettes | 48| 18.6| | |rel_whitespace_tests | 31| 67.1| | |doclines_per_fn_exp | 19| 12.2| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 34| 58.2| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


2. goodpractice and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/frankfarach/npi/workflows/R-CMD-check/badge.svg)](https://github.com/frankfarach/npi/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:--------------------------|:----------|:------|:----------| |Commands |skipped |db4973 |2022-03-04 | |pages build and deployment |success |552287 |2022-03-04 | |pkgdown |success |5a1981 |2022-03-04 | |R-CMD-check |success |5a1981 |2022-03-04 | |test-coverage |success |5a1981 |2022-03-04 | |Update CITATION.cff |success |5a1981 |2022-03-04 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 93.45 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found no issues with this package!


Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.3.88 | |pkgcheck |0.0.2.239 |


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

jooolia commented 2 years ago

Dear @frankfarach, Thank you for your submission. The package appears to be in-scope and in good shape, so I am looking for an editor to handle this submission. Thanks, Julia

frankfarach commented 2 years ago

Hi @jooolia, Thanks for the update. I look forward to working with the peer review team. Frank

jooolia commented 2 years ago

@ropensci-review-bot assign @maelle as editor

ropensci-review-bot commented 2 years ago

Assigned! @maelle is now the editor

maelle commented 2 years ago

:wave: @frankfarach Thanks for your submission! I'll start looking for reviewers, I have just one question and one comment below.

Editor checks:

Editor comments


maelle commented 2 years ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 2 years ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/505_status.svg)](https://github.com/ropensci/software-review/issues/505)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

maelle commented 2 years ago

@ropensci-review-bot add @Rekyt to reviewers

ropensci-review-bot commented 2 years ago

@Rekyt added to the reviewers list. Review due date is 2022-04-04. Thanks @Rekyt for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@Rekyt: If you haven't done so, please fill this form for us to update our reviewers records.

frankfarach commented 2 years ago

Hi @maelle, thanks for the comment and question, which I address below.

Editor comments

  • There aren't many functions but you could still add grouping to the reference index so that the most important functions come first.

Thanks for the suggestion. I have created an issue for this.

  • I was curious to know why you use mockery::.mockPaths()?

.mockPaths() is from the httptest package. It lets you set a directory other than the current working directory for recording API fixtures with functions like httptest::with_mock_api(). It looks like I might not need .mockPaths().

Your question made me realize that I am loading both httptest and mockery in tests/testthat/setup.R. I think I am only using functions from httptests in my test suite, so I may not need mockery at all. I've created another issue to check on this as part of my response to peer review.

maelle commented 2 years ago

:wave: @frankfarach! Thank you! I realized afterwards .mockPaths() was from httptest, as I was working on a package that uses httptest. :woman_facepalming: (in that package we use with_mock_dir())

maelle commented 2 years ago

@ropensci-review-bot add @zabore to reviewers

ropensci-review-bot commented 2 years ago

@zabore added to the reviewers list. Review due date is 2022-04-08. Thanks @zabore for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@zabore: If you haven't done so, please fill this form for us to update our reviewers records.

maelle commented 2 years ago

@ropensci-review-bot set due date for @zabore to 2022-04-18

ropensci-review-bot commented 2 years ago

Review due date for @zabore is now 18-April-2022

Rekyt commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 3


Review Comments

Overall the package is very well designed. It has simple and meaningful functions and hides a quite important machinery to make the results of the NPI API easily accessible and exploitable in R. I would like to congratulate the author for this work.

Regarding code efficiency I didn't found any improvement that could be easily made as I have the impression the code is pretty optimized both from a reading and performance standpoint.

Regarding the documentation, I think it's very well written and clear. It's easy to understand it reading from different entry points. The difficulties I had understanding were rather related to the NPI API that I am not familiar with.

I have however a few comments on how to improve the coverage of the package and found several edge cases not well covered by the package, that could be worth to include as additional tests.

Documentation

API URL

You could indicate API URL in DESCRIPTION file, this is needed for submission to CRAN (if you consider it). It is also possible to specify which link is what in the same file. Like it is done in taxize DESCRIPTION file https://github.com/ropensci/taxize/blob/a4db9a790e48c93235cac6145826c097ec687408/DESCRIPTION#L10-L12

npis description

The description of npis dataset doesn't seem to fit the dataset as it specifies "list of 0-n tibbles" for one column. It is not clear to me how to read it. Though I think it's nice to indicate the type of element. Maybe a way to make it more explicit would be to indicate "list-column of tibbles with X rows and Y columns".

extra vignette: use cases

I have no idea how this API could be useful in a real use case. But it may be nice to have an extra vignette showing how you can answer some questions through this API (how is it useful for research? for health administration?).

Function usage

Rate limitation

Is there a form of rate limitation? It seems reasonable given that we access an API. Nothing is shown from the API standpoint nor the package? (I've seen it mentioned in the "Getting started" vignette but it could be more explicit). If it is the case, that could be a nice thing to mention to the user so that they are aware of the time it would take to get their results back.

Can't use npi_summarize() with country code

I've found an issue when querying by country code which returns slightly differently formatted query and as such renders npi_summarize() unusable.

reprex:

library("npi")
ki = npi_search(country_code = "DE")
#> Requesting records 0-10...
npi_summarize(ki)
#> Error:
#> ! Tibble columns must have compatible sizes.
#> * Size 10: Existing data.
#> * Size 12: Column `primary_taxonomy`.
#> i Only values of size one are recycled.
#> Run `rlang::last_error()` to see where the error occurred.

Maybe npi_summarize() should take into account this edge case (even if I suppose there are not that many American health professionals outside the US).

Error when internet is off

Because this is a recurrent theme for API packages (and CRAN asks them to "fail gracefully" in these cases), I've checked what happens when I turn of the internet and I don't think it is that explicit for the user.

> npi_search(city = "San Francisco")
Requesting records 0-10...
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Could not resolve host: npiregistry.cms.hhs.gov

Maybe there could be a way to display a better error message when internet is off or the website unreachable? Like check first hand if internet is off and errors explicitly if it is the case.

Special characters in npi_search() arguments

Character arguments of npi_search() allow for some special character as specified by the documentation. But when searching with other special characters, the query is still submitted.

reprex:

library("npi")
npi_search(first_name = "KOÅ’*")
#> Requesting records 0-10...
#> Error in `npi_handle_response()`:
#> ! 
#> Field: first_name
#> Field contains special character(s) or wrong number of characters

Created on 2022-04-04 by the reprex package (v2.0.1)

It works well but it fallback on the API when this is the case, I think it could be possible to catch these issues early when processing these arguments to save time and additional queries.

Wildcards special cases

Same as above, some special queries with wildcards used improperly (not trailing) could be caught earlier. I had strange results with some.

reprex:

library("npi")

# Trying to mess up with wildcards
ko = npi_search(last_name = "M*ll*")
#> Requesting records 0-10...
ko$basic  # The answer has nothing to do with the pattern
#> [[1]]
#> # A tibble: 1 x 11
#>   first_name last_name middle_name credential sole_proprietor gender
#>   <chr>      <chr>     <chr>       <chr>      <chr>           <chr> 
#> 1 JENNY      ENSTROM   E           PA         NO              F     
#> # ... with 5 more variables: enumeration_date <chr>, last_updated <chr>,
#> #   status <chr>, name <chr>, certification_date <chr>

# Wildcards in the middle do not work
ko = npi_search(last_name = "M*ll")
#> Requesting records 0-10...
ko$basic  # And I get the same answer instead of an error?!
#> [[1]]
#> # A tibble: 1 x 11
#>   first_name last_name middle_name credential sole_proprietor gender
#>   <chr>      <chr>     <chr>       <chr>      <chr>           <chr> 
#> 1 JENNY      ENSTROM   E           PA         NO              F     
#> # ... with 5 more variables: enumeration_date <chr>, last_updated <chr>,
#> #   status <chr>, name <chr>, certification_date <chr>

Created on 2022-04-04 by the reprex package (v2.0.1)

User agent

It's great that the package provides a user-agent. It has been proven to be the best way possible to access an API to provide a proof of who is accessing the API.

In the README there is a mention of the user agent (https://github.com/frankfarach/npi/blob/d4e98a52ccc0a7f71328582333e5b4191f4796b9/README.Rmd#L114-L120). Which seems a great idea. However, this does not consider that the user may not be familiar with the concept of user-agent. Also, the displayed example doesn't show in which way it can be helpful to change its user-agent. Maybe you could go to a more concrete example. I also wonder if it is a good idea to let the user entirely remove the reference to the package.

To improve the user-agent, which only points to the URL of the repo, it would be good to indicate the version npi in use like it is done in taxize. If it is needed for the user to modify its user agent, then maybe providing a fixed part of the UA with the version of the package and a customizable second part would be even more complete.

Code

checkmate

checkmate is in Suggests but is used across several functions and no tests are done if package isn't installed. Maybe a good idea to make an Imports?

tidyr warning

At several occasion we face this message:

Warning message:
The `.sep` argument of `unnest()` is deprecated as of tidyr 1.0.0.
Use `names_sep = '_'` instead.

while there is a function to check for installed tidyr version tidyr_new_interface(). On my computer I had the error when using npi_flatten() even though I had tidyr v.1.2.0 installed

Tests

Functions delay_by() and remove_null() are defined but never used (and never tested). Was that planned for another update? Because it artificially reduces the coverage.

No tests have been done with empty results? Certain parameter combinations give back empty results and it could be easy to cover L177 of npi_search.R https://app.codecov.io/gh/frankfarach/npi/blob/master/R/npi_search.R (for ex. npi_search(last_name = "Grenie");))

npi_flatten() is not covered at all and could be covered by developing tests using the npis dataset.

frankfarach commented 2 years ago

Hi @Rekyt,

Thank you for your review of the npi package. I really appreciate your kind words about the package and -- even more than that -- your specific recommendations for improving the user experience and the quality of the code.

I look forward to responding to your review in detail.

frankfarach commented 2 years ago

Hi @maelle, @Rekyt, and @zabore,

As you will see, I have been creating a new issue in the package repo for each substantive item in your reviews. Each issue links back to the originating comment(s) in this review thread and all (and only) such issues are collected in an rOpenSci Review milestone. I plan to link to these issues and any pull requests I make to the package repository in my forthcoming responses.

Although I am doing this primarily to stay organized and preserve the intellectual lineage of future changes to the package, I thought I should explain the appearance of a bunch of issue references in this thread. I also hope it's helpful to you -- but if not, let me know and I'll see if I can remove the backlinks, or at least not link to this thread when @zabore's review arrives.

zabore commented 2 years ago

Here is my review!


Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 4


Review Comments

Overall I found this package to be well designed and well documented. Nice work! I understand what it does, but I was left wondering why. It could be nice to include a vignette demonstrating how you envision this being used. How are people getting this information now, and how will this streamline things for them? Below are some more specific comments, but overall I found no major issues.

README

vignette

Documentation/Examples

Tests

Functionality/performance

maelle commented 2 years ago

Thanks a ton @Rekyt & @zabore for your thorough reviews! @zabore, would you mind adding the time spent reviewing to your comment after "Estimated hours spent reviewing: "? Thank you!

@frankfarach I think the backlinks & milestone are great. We will still expect your response to be more or less self-explanatory so not just a list of links (I'm not assuming you got this wrong, just stating this just in case!).

maelle commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/505#issuecomment-1087579526 time 3

ropensci-review-bot commented 2 years ago

Logged review for Rekyt (hours: 3)

maelle commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/505#issuecomment-1090531882= time NA

ropensci-review-bot commented 2 years ago

Logged review for zabore (hours: NA)

zabore commented 2 years ago

@maelle I added the time, sorry about the initial oversight!

maelle commented 2 years ago

@zabore thank you! actually it was a great oversight as it allowed us to add the possibility to log a review without a time entered yet. :wink:

maelle commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/505#issuecomment-1090531882= time 4

ropensci-review-bot commented 2 years ago

Logged review for zabore (hours: 4)

frankfarach commented 2 years ago

Thanks a ton @Rekyt & @zabore for your thorough reviews! @zabore, would you mind adding the time spent reviewing to your comment after "Estimated hours spent reviewing: "? Thank you!

Likewise, @Rekyt and @zabore, thanks so much for your reviews! I really appreciate the time you took to review the package and look forward to responding.

@frankfarach I think the backlinks & milestone are great. We will still expect your response to be more or less self-explanatory so not just a list of links (I'm not assuming you got this wrong, just stating this just in case!).

@maelle Understood. I'll submit a self-contained prose response. The links will only be there for convenience. Thanks!

frankfarach commented 2 years ago

Hi @maelle, I wanted to let you know that I'm still making changes and preparing my response. You can see my commits here. I've addressed about 2/3 of the issues raised by reviewers so far. Based on my slow-but-steady progress to date and my available time in the near future, I am aiming to submit my response by May 6. I mention this as I know there's an expectation for package authors to provide their responses within two weeks of the last-submitted review, and we're a few days beyond that period already. Please let me know if my proposed timeline works for you. Thanks for your patience!

maelle commented 2 years ago

Thank you for the update @frankfarach!

maelle commented 2 years ago

@frankfarach you should now be able to add a comment with the contents @ropensci-review-bot check package in order to get a pkgcheck report :-) (it was previously only possible for editors)

maelle commented 2 years ago

@frankfarach any update? Thank you! :smile_cat:

frankfarach commented 2 years ago

@maelle Sorry for my delayed response. Can you please put a 3-month hold on this review? I intend to finish this submission but have other things I need to take care of in the meantime. Thank you for considering this request. 😄

maelle commented 2 years ago

@ropensci-review-bot put on hold

ropensci-review-bot commented 2 years ago

Submission on hold!

maelle commented 2 years ago

Thank you for the update @frankfarach! Done :heavy_check_mark:

maelle commented 2 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 2 years ago

Thanks, about to send the query.

ropensci-review-bot commented 2 years ago

:rocket:

Error: Issue template has no 'repourl'

:wave:

maelle commented 2 years ago

@frankfarach Please don't mind me, I was just checking the bot. :sweat_smile:

ropensci-review-bot commented 2 years ago

@maelle: Please review the holding status

maelle commented 2 years ago

:wave: @frankfarach, any update? :smile_cat:

frankfarach commented 2 years ago

👋 @frankfarach, any update? 😸

The API recently introduced a change in the response type that I need to fix as it prevents all search requests from being parsed. I am hoping to have this resolved in the next week, then I will return to addressing the remaining issues mentioned by the reviewers.

maelle commented 2 years ago

Thank you! I hope the API change won't be too painful to tackle!