rhud: A R interface for the US Department of Housing and Urban Development APIs

etam4260 commented 2 years ago

Submitting Author Name: Name Submitting Author Github Handle: !--author1-->@etam4260@jhollist<!--end-editor-- Reviewers: @rtaph, @khueyama

Due date for @rtaph: 2022-06-07 Due date for @khueyama: 2022-06-09

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: hudr
Title: A R interface for accessing HUD (US Department of Housing and Urban Development) APIs
Version: 0.1.0.9000
Authors@R: 
    c(person("Emmet", "Tam", ,"emmet_tam@yahoo.com", role = c("aut", "cre", "cph")),
    person("Allison", "Reilly", ,"areilly2@umd.edu", role = c("ctb")),
    person("Hamed", "Ghaedi", ,"hghaedi@terpmail.umd.edu", role = c("ctb")))
Description: 
    An R interface for accessing HUD (US Department of Housing and Urban Development) API.
    The HUD has four main datasets, USPS Crosswalk, Fair Markets Rent,
    Income Limits, and the Comprehensive Housing Affordability Strategy.
License: GPL (>= 2)
Language: en-US
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2.9000
URL: https://github.com/etam4260/hudr, https://etam4260.github.io/hudr/index.html
BugReports: https://github.com/etam4260/hudr/issues
Suggests: 
    covr,
    httptest,
    knitr,
    rmarkdown,
    testthat (>= 3.0.0)
Imports: 
    httr,
    devtools,
    zoo,
    rio
Config/testthat/edition: 3
VignetteBuilder: knitr
ByteCompile: true

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [X] data retrieval
- [ ] data extraction
- [X] data munging
- [ ] data deposition
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [X] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

It is a data retrieval package because it retrieves data from an API. It 'will' be a data munging package after implementation of additional features such as cross walking an entire dataset. Furthermore, the APIs which this package retrieves data from are associated with geographic identifiers.

Who is the target audience and what are scientific applications of this package?

I am hoping to reach professors, researchers, and students with this package. This gives access to the crosswalk files which is a geospatial technique described very well in these journal articles:

Din, Alexander and Wilson, Ron, 2020. “Crosswalking ZIP Codes to Census Geographies: Geoprocessing the U.S. Department of Housing & Urban Development’s ZIP Code Crosswalk Files,” Cityscape: A Journal of Policy Development and Research, Volume 22, Number 1, https://www.huduser.gov/portal/periodicals/cityscpe/vol22num1/ch12.pdf

Wilson, Ron and Din, Alexander, 2018. “Understanding and Enhancing the U.S. Department of Housing and Urban Development’s ZIP Code Crosswalk Files,” Cityscape: A Journal of Policy Development and Research, Volume 20 Number 2, 277 – 294.

Additionally, it provides access to Income Limits and Fair Markets Rent as well as Comprehensive Housing and Affordability datasets provided by HUD which is of interest to housing and social science researchers.

Implementation of a crosswalk function is planned in future releases, which will help crosswalk a US dataset from one geographic identifier into another using the method described in the papers above.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

Recently, a hudr package got published on CRAN, but it looks like some of derives from the work I currently have. I am not sure how this will affect my prospects of submitting this to CRAN. Furthermore, their package provides only access to the fair markets rent and income limits API provide by HUD. Mine gives access to all the APIs that are currently supported by HUD USER (https://www.huduser.gov/portal/home.html) as well as providing more flexibility and intuitiveness.

As for documentation and testing, I believe my package could be improved. I don't test very many edge cases and have not created vignettes for every function.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

For the most part, I think yes. The package requires an API key which I have users store using Sys.setenv(). I have not looked into the more sophisticated methods like the keyring package and do not instruct the user on how to set the key to be persistent.

If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

https://github.com/ropensci/software-review/issues/500 @jooolia

Explain reasons for any pkgcheck items which your package is unable to pass.

I seem to get some errors when running pkgcheck. I manually made sure I had all the necessary components. For the pkgcheck requirement that says all functions need examples, I am assuming that only includes exported ones?

Technical checks

Confirm each of the following by checking the box.

[X] I have read the guide for authors and rOpenSci packaging guide.

This package:

[X] does not violate the Terms of Service of any service it interacts with.
[X] has a CRAN and OSI accepted license.
[X] contains a README with instructions for installing the development version.
[X] includes documentation with examples for all functions, created with roxygen2.
[X] contains a vignette with examples of its essential functions and uses.
[X] has a test suite.
[X] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[X] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[X] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

ropensci-review-bot commented 2 years ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 2 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 2 years ago

Checks for hudr (v0.1.0.9000)

git hash: 660ad0e2

:heavy_multiplication_x: Package name is not available (on CRAN).
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_multiplication_x: does not have a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_multiplication_x: Package coverage failed
:heavy_multiplication_x: R CMD check process failed with message: 'Build process failed'.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>= 2)

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 11 files) and - 1 authors - 7 vignettes - no internal data file - 4 imported packages - 30 exported functions (median 22 lines of code) - 63 non-exported functions in R (median 20 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 11| 62.6| | |files_vignettes | 7| 98.5| | |files_tests | 10| 90.7| | |loc_R | 786| 61.3| | |loc_vignettes | 878| 89.0| | |loc_tests | 357| 67.9| | |num_vignettes | 7| 99.2|TRUE | |n_fns_r | 93| 74.6| | |n_fns_r_exported | 30| 78.3| | |n_fns_r_not_exported | 63| 73.5| | |n_fns_per_file_r | 4| 63.5| | |num_params_per_fn | 4| 54.6| | |loc_per_fn_r | 22| 64.1| | |loc_per_fn_r_exp | 22| 52.1| | |loc_per_fn_r_not_exp | 20| 63.0| | |rel_whitespace_R | 37| 79.7| | |rel_whitespace_vignettes | 51| 96.3|TRUE | |rel_whitespace_tests | 32| 75.2| | |doclines_per_fn_exp | 40| 50.1| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 88| 77.1| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/etam4260/hudr/workflows/R-CMD-check/badge.svg)](https://github.com/etam4260/hudr/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:--------------------------|:----------|:------|:----------| |pages build and deployment |success |660ad0 |2022-04-01 | |R-CMD-check |success |660ad0 |2022-04-01 | |test-coverage |success |06b739 |2022-03-28 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following error: 1. Error in proc$get_built_file() : Build process failed R CMD check generated the following check_fail: 1. no_import_package_as_a_whole #### Test coverage with [covr](https://covr.r-lib.org/) ERROR: Test Coverage Failed #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) Error : Build failed, unknown error, standard output: * checking for file ‘hudr/DESCRIPTION’ ... OK * preparing ‘hudr’: * checking DESCRIPTION meta-information ... OK * installing the package to build vignettes * creating vignettes ... ERROR --- re-building ‘authors.Rmd’ using rmarkdown --- finished re-building ‘authors.Rmd’ --- re-building ‘Community-Development-Block-Grant.Rmd’ using rmarkdown --- finished re-building ‘Community-Development-Block-Grant.Rmd’ --- re-building ‘Comprehensive-Housing-and-Affordability-Strategy.Rmd’ using rmarkdown --- finished re-building ‘Comprehensive-Housing-and-Affordability-Strategy.Rmd’ --- re-building ‘Crosswalk.Rmd’ using rmarkdown Quitting from lines 101-105 (Crosswalk.Rmd) Error: processing vignette 'Crosswalk.Rmd' failed with diagnostics: Did you forget to set the key? Please go to https://www.huduser.gov/hudapi/public/register?comingfrom=1 to and sign up and get a token. Then save this to your environment using Sys.setenv('HUD_KEY' = YOUR_KEY) --- failed re-building ‘Crosswalk.Rmd’ --- re-building ‘Fair-Markets-Rent.Rmd’ using rmarkdown --- finished re-building ‘Fair-Markets-Rent.Rmd’ --- re-building ‘Income-Limits.Rmd’ using rmarkdown --- finished re-building ‘Income-Limits.Rmd’ --- re-building ‘Setup.Rmd’ using rmarkdown --- finished re-building ‘Setup.Rmd’ SUMMARY: processing the following file failed: ‘Crosswalk.Rmd’ Error: Vignette re-building failed. Execution halted #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 351 potential issues: message | number of times --- | --- Avoid 1:length(...) expressions, use seq_len. | 10 Avoid 1:nrow(...) expressions, use seq_len. | 2 Avoid using sapply, consider vapply instead, that's type safe | 1 Lines should not be more than 80 characters. | 306 Use <-, not =, for assignment. | 32

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.3.96 | |pkgcheck |0.0.2.276 |

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

jooolia commented 2 years ago

Dear @etam4260, Thank you for your submission. Thank you for highlighting the new package hudr that is already on CRAN. Can you describe how you will deal with the name collision (I see it is already a bit problematic as the cran link on your pkgdown site links to that package or maybe that is on purpose?). Also, can you elaborate a bit more on the differences you see between the two packages? thanks, Julia

etam4260 commented 2 years ago

Hi @jooolia

An easy solution to deal with the naming conflict I think would be to just rename the entire package as ‘rhud’ and update the documentation accordingly.

Otherwise, I have already contacted CRAN about this issue, but have not heard back about any resolutions. 😔

I’m not sure about any other options. Any other suggestions are appreciated. 🙏🏼

I took a look at the hudr package published on CRAN and it looks like they only implement APIs for two main datasets provided by HUD and they don’t have all the sub datasets. For example, Fair Markets Rent has state, county, and small area level data — they only have state level data. 🧐

hudr(CRAN)

Fair Markets Rent (Single Query)
- State Level
Income Limits (Single Query)
- State Level

They also provide functions for HUD miscellaneous APIs:

Query for all US states
Query for all counties in a state (Single Query)
Query for all metropolitan areas in a state (Single Query)

Single query means their function calls only make 1 API call at a time, whereas multi query means multiple API calls can be done in a single function.

My package on the other hand supports the Crosswalk API and Comprehensive Housing and Affordability Strategy API as well.

hudr(etam4260/hudr)

Fair Markets Rent (Multi Query)
- State Level
- County
- Small Areas
Income Limits (Multi Query)
- State Level
- County
- Small Areas
USPS Crosswalk Files (Multi Query)
Comprehensive Housing and Affordability (Multi Query)
Query for all US states
Query for all counties in a state (Multi Query)
Query for all metropolitan areas in a state (Multi Query)
Query for all minor civil divisions in a state(Multi Query)
Query for all places in a state (Multi Query)

My package I believe is more flexible and intuitive:

There are many ways of identifying US states such as using their abbreviation, full name, or fips code. My package allows the user to query based either of the options whereas the CRAN version strictly requires using the state abbreviation.

For example,

get_hud_fmr_statedata(entityid = “Virginia”, year = “2020”, hud_key = “dqwqdqwd”)

does not work because entityid must be strictly ‘VA'

Furthermore, their function arguments are less intuitive. For example:

get_hud_fmr_statedata(entityid = “VA”, year = “2020”, hud_key = “dqwqdqwd”)

'entityid' I believe should be better named as state because you are only allowed to query for states. Furthermore, the function name itself I believe is a little verbose. My function names do not need the additionaly ‘get_’ at the front. ‘hud_key’ argument I believe should be better named as ‘key’.

When specifying the year argument as well as arguments that are made of numbers, they don’t allow both character and numeric type to be used. For example, 11023 which can be represented as both a numeric or character can only be inputted as “11023”. I believe the flexibility could be both a good and a bad thing though.

Their functions also only strictly allow for only singular API calls.

For example, you cannot query:

get_hud_fmr_statedata(entityid = c(“VA”, “CA"), year = “2020”, hud_key = “dqwqdqwd”) 😕 and you cannot query:

get_hud_fmr_statedata(entityid = c(“VA”), year = c(“2020”, “2019"), hud_key = “dqwqdqwd”) 😕 Their functions only allow querying for state level data but not county and small areas.

My implementation allows all of them in one function call as well allowing for multiple state and year inputs.

hud_fmr(“VA”, year = 2020, key = “qdqdqdw”) hud_fmr(c(“VA”, “CA”), year = “2020”, key = “dqdqdq”) hud_fmr(51, 2021)

hud_fmr(c(“VA”, “CA”), year = c(“2020”, “2019"), key = “dqdqdq”)

hud_fmr(010099999, year = “2020”, key = “qdqdqdqq”) hud_fmr(‘010099999’, year = “2020”, key = “qdqdqdqq”)

hud_fmr(‘METRO47900M47900’, year = 2020, key = “uefjhqiufd”)

For the X marks on the pkgcheck:

I have updated the repository to have a contributing file. Right now I have fixed rcmdcheck() so that it only give a single warning — a problem with rendering PDF documents with latex I think. It says something is wrong with Rd files. I’m not sure how to deal with the coverage error, as on my end codecov seems to be working fine. I made sure the HUD_KEY secret needed by the package is not "". If so, the tests and or code blocks are not run. 🤥

jooolia commented 2 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 2 years ago

Thanks, about to send the query.

ropensci-review-bot commented 2 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 2 years ago

Checks for hudr (v0.1.0.9000)

git hash: 120ebe8d

:heavy_multiplication_x: Package name is not available (on CRAN).
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_multiplication_x: Package coverage failed
:heavy_multiplication_x: R CMD check found 1 error.
:heavy_check_mark: R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>= 2)

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 11 files) and - 1 authors - 7 vignettes - no internal data file - 4 imported packages - 30 exported functions (median 22 lines of code) - 63 non-exported functions in R (median 20 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 11| 62.6| | |files_vignettes | 7| 98.5| | |files_tests | 10| 90.7| | |loc_R | 786| 61.3| | |loc_vignettes | 878| 89.0| | |loc_tests | 392| 69.9| | |num_vignettes | 7| 99.2|TRUE | |n_fns_r | 93| 74.6| | |n_fns_r_exported | 30| 78.3| | |n_fns_r_not_exported | 63| 73.5| | |n_fns_per_file_r | 4| 63.5| | |num_params_per_fn | 4| 54.6| | |loc_per_fn_r | 22| 64.1| | |loc_per_fn_r_exp | 22| 52.1| | |loc_per_fn_r_not_exp | 20| 63.0| | |rel_whitespace_R | 37| 79.7| | |rel_whitespace_vignettes | 51| 96.3|TRUE | |rel_whitespace_tests | 29| 74.9| | |doclines_per_fn_exp | 40| 50.1| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 88| 77.1| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/etam4260/hudr/workflows/R-CMD-check/badge.svg)](https://github.com/etam4260/hudr/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:--------------------------|:----------|:------|:----------| |pages build and deployment |success |120ebe |2022-04-01 | |R-CMD-check |success |120ebe |2022-04-01 | |test-coverage |success |06b739 |2022-03-28 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following error: 1. checking tests ... Running ‘testthat.R’ ERROR Running the tests in ‘tests/testthat.R’ failed. Last 13 lines of output: ══ Failed tests ════════════════════════════════════════════════════════════════ ── Error (test-test_cdbgdr.R:2:3): Main CDBG-DR data files from DRGR Public Portal ── Error in `curl::curl_fetch_memory(file)`: Timeout was reached: [drgr.hud.gov] Resolving timed out after 10000 milliseconds Backtrace: ▆ 1. └─hudr::hud_cdbg(1) at test-test_cdbgdr.R:2:2 2. ├─base::suppressMessages(import("https://drgr.hud.gov/public/downloads/DR-CDBG/CDBG-DR%20Financial%20Report%20by%20Appropriation.xlsx")) 3. │ └─base::withCallingHandlers(...) 4. └─rio::import("https://drgr.hud.gov/public/downloads/DR-CDBG/CDBG-DR%20Financial%20Report%20by%20Appropriation.xlsx") 5. └─rio:::remote_to_local(file, format = format) 6. └─curl::curl_fetch_memory(file) [ FAIL 1 | WARN 0 | SKIP 35 | PASS 2 ] Error: Test failures Execution halted R CMD check generated the following notes: 1. checking top-level files ... NOTE File LICENSE is not mentioned in the DESCRIPTION file. 2. checking dependencies in R code ... NOTE Namespace in Imports field not imported from: ‘devtools’ All declared Imports should be used. R CMD check generated the following test_fail: 1. > library(testthat) > library(hudr) > > test_check("hudr") [ FAIL 1 | WARN 0 | SKIP 35 | PASS 2 ] ══ Skipped tests ═══════════════════════════════════════════════════════════════ • Sys.getenv("HUD_KEY") == "" is TRUE (35) ══ Failed tests ════════════════════════════════════════════════════════════════ ── Error (test-test_cdbgdr.R:2:3): Main CDBG-DR data files from DRGR Public Portal ── Error in `curl::curl_fetch_memory(file)`: Timeout was reached: [drgr.hud.gov] Resolving timed out after 10000 milliseconds Backtrace: ▆ 1. └─hudr::hud_cdbg(1) at test-test_cdbgdr.R:2:2 2. ├─base::suppressMessages(import("https://drgr.hud.gov/public/downloads/DR-CDBG/CDBG-DR%20Financial%20Report%20by%20Appropriation.xlsx")) 3. │ └─base::withCallingHandlers(...) 4. └─rio::import("https://drgr.hud.gov/public/downloads/DR-CDBG/CDBG-DR%20Financial%20Report%20by%20Appropriation.xlsx") 5. └─rio:::remote_to_local(file, format = format) 6. └─curl::curl_fetch_memory(file) [ FAIL 1 | WARN 0 | SKIP 35 | PASS 2 ] Error: Test failures Execution halted R CMD check generated the following check_fails: 1. no_import_package_as_a_whole 2. rcmdcheck_stale_license_file 3. rcmdcheck_imports_not_imported_from 4. rcmdcheck_tests_pass #### Test coverage with [covr](https://covr.r-lib.org/) ERROR: Test Coverage Failed #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- hud_chas | 27 hud_cw | 23 fmr_il_input_check_cleansing | 19 cw_input_check_cleansing | 17 hud_counties | 17 hud_minor_civil_divisions | 17 hud_places | 17 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 351 potential issues: message | number of times --- | --- Avoid 1:length(...) expressions, use seq_len. | 10 Avoid 1:nrow(...) expressions, use seq_len. | 2 Avoid using sapply, consider vapply instead, that's type safe | 1 Lines should not be more than 80 characters. | 306 Use <-, not =, for assignment. | 32

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.3.96 | |pkgcheck |0.0.2.276 |

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

jooolia commented 2 years ago

@ropensci-review-bot assign @jhollist as editor

ropensci-review-bot commented 2 years ago

Assigned! @jhollist is now the editor

jhollist commented 2 years ago

@etam4260 I will be serving as the editor on this submission. I think this is a good fit and your explanation of how your package is different than the existing hudr package has convinced that your package provides a more complete access to the HUD APIs.

So, before I move on to assigning reviewers we need to get several items resolved:

Package and repo name: Update this to avoid the conflict on CRAN. Make sure the change is also made to repository and in documenation
Pay close attention to the items goodpractice results section. In particular the lintr results and the 2 NOTEs.
You have one test failing. It looks like it might be related to access to an API key. Some suggestions on dealing with that are in https://devguide.ropensci.org/package-development-security-best-practices.html.
I am still somewhat new to the editorial side so I need to check on the statistical properties table. At first glance nothing serious jumps out, but give me a bit more time to dig into the individual parameters.

If you have questions on any of this let me know.

jhollist commented 2 years ago

Just pinging you again @etam4260. Any questions about the few items I listed above?

etam4260 commented 2 years ago

Hi @jhollist,

Apologize for not getting back sooner: just nabbed some time to work on it a bit today. I have been working on fixing a few bugs I found on my end and creating more robust testing infrastructure. I have also changed the structure of some of the output data. Therefore, the documentation is a bit outdated -- which I will address. As for the failing test, I was able to fix that. I am hoping to get to the lintr issues soon as well as updating the name of the package (and documentation associated with it). As for one of the NOTES, it seems like its having a problem with files LICENSE.md, CODE_OF_CONDUCT.md and codemeta.json on the top level folder -- is there another location where these should be located?

jhollist commented 2 years ago

Couple of thoughts.

As for the license stuff. You will need to make sure that DESCRIPTION knows that you have an additional file. If you add "+ file LICENSE" to "License: GPL (>= 2)" that should take care of that note. I believe the two other files just need to be added to .Rbuildignore. The root folder is a good spot for both (might event be required for codemeta.json).

And no need to apologize! I understand the need to juggle multiple demands on time!

etam4260 commented 2 years ago

@jhollist

Thanks for tip! I was able to deal with all the notes. For adding the "+ file LICENSE" to the description file, it gave me another error: "License components with restrictions not permitted." So I instead added the LICENSE file to the .Rbuildignore instead and it 'fixed' it. As for the lintr issues, I fixed maybe 97% of the issues -- most of the remaining ones deal with cyclocomplexity as well as function name length. I haven't applied those 'goodpractices' in the vignettes yet: it looks like lintr doesn't check those? Just need to update the package name and the associated documentation now!

jhollist commented 2 years ago

I think that sounds pretty good. Once you have the name and documentation updated, let me know. I will start looking for reviewers next week.

On Fri, Apr 29, 2022 at 5:42 PM Emmet Tam @.***> wrote:

@jhollist https://github.com/jhollist

Thanks for tip! I was able to deal with all the notes. For adding the "+ file LICENSE" to the description file, it gave me another error: "License components with restrictions not permitted." So I instead added the LICENSE file to the .Rbuildignore instead and it 'fixed' it. As for the lintr issues, I fixed maybe 97% of the issues -- most of the remaining ones deal with cyclocomplexity as well as function name length. I haven't applied those 'goodpractices' in the vignettes yet: it looks like lintr doesn't check those? Just need to update the package name and the associated documentation now!

— Reply to this email directly, view it on GitHub https://github.com/ropensci/software-review/issues/524#issuecomment-1113773184, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJPYS3F7XN2XGEGJ76D5VLVHRJTRANCNFSM5SHUCVZA . You are receiving this because you were mentioned.Message ID: @.***>

-- Jeffrey W. Hollister email: @.*** cell: 401 556 4087 https://jwhollister.com

etam4260 commented 2 years ago

@jhollist

Package has been renamed and documentation has been updated. 🙃

jhollist commented 2 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 2 years ago

Thanks, about to send the query.

ropensci-review-bot commented 2 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 2 years ago

Checks for rhud (v0.2.0.9000)

git hash: b5aa3d80

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_multiplication_x: These functions do not have examples: [crosswalk, rhud_website].
:heavy_check_mark: Package has continuous integration checks.
:heavy_multiplication_x: Package coverage is 0.7% (should be at least 75%).
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>= 2)

1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:---------|------:| |internal |base | 331| |internal |rhud | 90| |internal |utils | 24| |imports |httr | NA| |imports |zoo | NA| |imports |rio | NA| |suggests |covr | NA| |suggests |httptest | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |suggests |devtools | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

paste (41), args (32), c (26), expand.grid (21), try (20), for (19), seq_len (19), as.data.frame (16), format (16), Sys.Date (16), call (14), length (13), do.call (10), as.character (6), grepl (6), nrow (6), unlist (6), rbind (5), as.integer (4), data.frame (4), Sys.getenv (4), cbind (3), rep (3), url (3), list (2), return (2), strsplit (2), substr (2), diff (1), emptyenv (1), grep (1), merge (1), nchar (1), ncol (1), Negate (1), new.env (1), readLines (1), regexec (1)

rhud

cw_input_check_cleansing (15), fmr_il_input_check_cleansing (7), chas_input_check_cleansing (6), chas_do_query_calls (4), misc_do_query_call (4), hud_fmr_state_counties (3), crosswalk_a_dataset_input_check_cleansing (2), hud_cw_zip_county (2), hud_fmr_state_metroareas (2), hud_state_minor_civil_divisions (2), hud_state_places (2), add_leading_zeros (1), capitalize (1), check_is_not_list (1), crosswalk (1), cw_do_query_calls (1), decimal_num (1), download_bar (1), elevennumbers (1), fivenumbers (1), fivenumsthenfournums (1), fix_geoid (1), fournumbers (1), hud_chas (1), hud_chas_county (1), hud_chas_nation (1), hud_chas_state (1), hud_chas_state_mcd (1), hud_chas_state_place (1), hud_cw (1), hud_cw_cbsa_zip (1), hud_cw_cbsadiv_zip (1), hud_cw_cd_zip (1), hud_cw_county_zip (1), hud_cw_countysub_zip (1), hud_cw_tract_zip (1), hud_cw_zip_cbsa (1), hud_cw_zip_cbsadiv (1), hud_cw_zip_cd (1), hud_cw_zip_countysub (1), hud_cw_zip_tract (1), hud_fmr (1), hud_fmr_county_zip (1), hud_fmr_metroarea_zip (1), hud_get_key (1), hud_il (1), hud_nation_states_territories (1), hud_set_key (1), hud_state_counties (1), hud_state_metropolitan (1), is.negative (1), numbers_only (1)

utils

zip (18), data (6)

2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 16 files) and - 1 authors - 7 vignettes - no internal data file - 3 imported packages - 34 exported functions (median 30 lines of code) - 94 non-exported functions in R (median 23 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 16| 74.9| | |files_vignettes | 7| 98.5| | |files_tests | 11| 91.7| | |loc_R | 1845| 82.6| | |loc_vignettes | 902| 89.4| | |loc_tests | 561| 77.1| | |num_vignettes | 7| 99.2|TRUE | |n_fns_r | 128| 82.1| | |n_fns_r_exported | 34| 81.0| | |n_fns_r_not_exported | 94| 82.6| | |n_fns_per_file_r | 4| 62.2| | |num_params_per_fn | 3| 33.6| | |loc_per_fn_r | 26| 71.1| | |loc_per_fn_r_exp | 30| 63.0| | |loc_per_fn_r_not_exp | 24| 70.1| | |rel_whitespace_R | 25| 87.6| | |rel_whitespace_vignettes | 50| 96.4|TRUE | |rel_whitespace_tests | 31| 83.0| | |doclines_per_fn_exp | 38| 47.0| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 124| 82.6| | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

3. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/etam4260/rhud/workflows/R-CMD-check/badge.svg)](https://github.com/etam4260/rhud/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:--------------------------|:----------|:------|:----------| |pages build and deployment |success |b5aa3d |2022-05-03 | |R-CMD-check |success |3e3540 |2022-05-03 | |test-coverage |failure |0b6873 |2022-05-02 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following notes: 1. checking dependencies in R code ... NOTE Namespaces in Imports field not imported from: ‘rio’ ‘zoo’ All declared Imports should be used. 2. checking R code for possible problems ... NOTE crosswalk: no visible binding for global variable ‘w_geoid’ hud_state_minor_civil_divisions: no visible binding for global variable ‘FAlSE’ Undefined global functions or variables: FAlSE w_geoid R CMD check generated the following check_fails: 1. cyclocomp 2. no_import_package_as_a_whole 3. rcmdcheck_imports_not_imported_from 4. rcmdcheck_undefined_globals #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 0.74 The following files are not completely covered by tests: file | coverage --- | --- R/hudchas.R | 0% R/hudcheckfornumbersonly.R | 0% R/hudcheckisnotlist.R | 0% R/hudcw.R | 0% R/huddatasetcw.R | 0% R/huddoquerycalls.R | 0% R/hudfmr.R | 0% R/hudinputcheckcleansing.R | 6.99% R/hudkey.R | 0% R/hudloadingbar.R | 0% R/hudmisc.R | 0% R/hudmischelpers.R | 0% R/huduser.R | 0.48% R/hudwebsitedocs.R | 0% #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- crosswalk | 67 crosswalk_a_dataset_input_check_cleansing | 50 hud_chas | 28 hud_cw | 25 check_is_not_list | 20 fmr_il_input_check_cleansing | 19 hud_state_metropolitan | 19 cw_input_check_cleansing | 17 hud_state_counties | 17 hud_state_minor_civil_divisions | 17 hud_state_places | 17 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 17 potential issues: message | number of times --- | --- Avoid using sapply, consider vapply instead, that's type safe | 6 Lines should not be more than 80 characters. | 10 Use <-, not =, for assignment. | 1

Package Versions

|package |version | |:--------|:--------| |pkgstats |0.0.4.30 | |pkgcheck |0.0.3.15 |

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

jhollist commented 2 years ago

@etam4260 Two things came up in the most recent checks

1.) A couple of functions are missing examples. Add these examples into the docs 2.) The test coverage. I see that most of your tests skip without the key hence the low number above. Looks like your current suite is at about 72%. While we shoot for 75%, this is close enough to get the review process started. I would just ask that after and/or during the review that you revisit the tests and make sure you have adequate coverage.

I think we are close enough. I will start looking for potential reviewers!

jhollist commented 2 years ago

@etam4260 Just a quick update. I have been looking for reviewers. I have one so far (Thank you, @rtaph!) and have another invite out that I am waiting on.

Have you had a chance to add those examples to the few functions that were missing them?

jhollist commented 2 years ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 2 years ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/524_status.svg)](https://github.com/ropensci/software-review/issues/524)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

jhollist commented 2 years ago

@ropensci-review-bot assign @rtaph as reviewer

ropensci-review-bot commented 2 years ago

@rtaph added to the reviewers list. Review due date is 2022-06-07. Thanks @rtaph for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@rtaph: If you haven't done so, please fill this form for us to update our reviewers records.

rtaph commented 2 years ago

Nice to meet you @etam4260, @jooolia, and @jhollist. Confirming the review deadline of 2022-06-07 and looking forward to getting to know your package!

etam4260 commented 2 years ago

@jhollist Yes, I was able to add the documentation to those functions. As for the testing suite, it is at 70% now. Still working to get over that 75% mark. I have also added the under review badge to the README.

@rtaph Thanks for taking the time to review the package

jhollist commented 2 years ago

Sounds good! Since the code might change some as a result of the review, it might make sense to wait until after review to worry too much about the code coverage, unless there are a few functions that would be easy to add to the tests at this point.

Still working on that second reviewer!

jhollist commented 2 years ago

@ropensci-review-bot assign @khueyama as reviewer

ropensci-review-bot commented 2 years ago

@khueyama added to the reviewers list. Review due date is 2022-06-09. Thanks @khueyama for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@khueyama: If you haven't done so, please fill this form for us to update our reviewers records.

etam4260 commented 2 years ago

Hi @khueyama, thanks for taking the time to review the rhud package!

@jhollist When do you recommend submitting this package on CRAN? After or during the review process? I don't want to be too hesitant just in case the package name gets taken again. Do you know what the process is like for a CRAN submission? Thanks again for the help.

khueyama commented 2 years ago

@etam4260 thank you for the opportunity to review the rhud package! Some nice work here that I'm sure will be valuable to researchers in this space. My review is below, let me know if you have any questions or follow-up.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
- khueyama: None
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
- khueyama: I would suggest putting the instructions (or at least a link to the docs website) for getting/setting the HUD API token prominently in the README, since this is a pretty important step to get up and running with the package
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[ ] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
- khueyama: I did not see contribution guidelines in the README or CONTRIBUTING. There is a .github/CONTRIBUTING.md file, but I believe it is auto-generated from a devtools function.

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- khueyama: I think you should include some small tests of the functionality under R/hudmischelpers.R and R/hudinputcheckcleansing.R and other helper functions.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
- khueyama: One note is that you could include a brief demonstration of usage in the README, even something as simple as just rhud::hud_chas_nation()

Estimated hours spent reviewing: 4

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Thank you for the opportunity to review this package! It will be a great resource for researchers.
I appreciated the extensive vignettes and documentation examples. The package is user-focused and user-friendly.
Some of the documentation could be clearer on the allowed inputs. For example, hud_chas_state(c("va", "wv"), year = c("2013-2017", "2014-2018")) will give you multiple years for multiple states, but I only figured that out from looking at the source code and playing around with things.
I had issues with hud_set_key(). Running hud_set_key(key, in_wkdir = FALSE, in_home = TRUE) added the key to my working directory .Rprofile file instead; cat $HOME/.Rprofile from the command line did not have a HUD key entry. Also, I already had an entry in the working directory .Rprofile, and this function didn't append properly, giving me a file that looked like options(blogdown.method = "markdown")Sys.setenv("HUD_KEY" = ...) all on a single line. This caused issues on a new R session.

Restarting R session...

Error: Unable to establish connection with R session when executing 'console_input'
Error: 1:38: unexpected symbol
1: options(blogdown.method = "markdown")Sys.setenv

It may be worth reaching out to the author of hudr to see if there is potential for collaboration? That may be preferable than having both rhud and hudr when there seems to be a lot of overlap in functionality.
I noticed a handful of instances where an empty vector is initialized and then appended to over a for-loop iteration. I don't know if the scale here gets large enough to cause noticeable performance issues, but this is something you generally want to avoid if possible. See this section of the Advanced R book for more: https://adv-r.hadley.nz/perf-improve.html#avoid-copies.
Will the auxiliary functions in R/hudsplitgeoid.R be used? Look like empty placeholders at the moment. I would recommend either building them out or removing before CRAN submission.
My overall impression is that there is a lot of duplication of logic throughout the package. For instance, the functions under R/hudcw.R seem to be nearly identical, with only a couple of inputs changing. There's an opportunity here to build a layer of abstraction with a function handling the core API. And then functions like hud_cw_zip_cbsadiv and hud_cw_zip_county would call the core function, with different inputs for the geoids etc.
There's a lot of places where you're setting the user-agent to be the package github url. You probably want a helper function for users to set their own user-agent. Or else a single user's abuse of the system may get the entire package banned. And you probably don't want to be fielding questions from HUD IT re a user you probably don't know.
devtools::check() returned three NOTES that should be addressed:

❯ checking top-level files ... NOTE
  Non-standard file/directory found at top level:
    ‘rhud.Rproj’

❯ checking dependencies in R code ... NOTE
  Namespaces in Imports field not imported from:
    ‘rio’ ‘zoo’
    All declared Imports should be used.

❯ checking for unstated dependencies in vignettes ... NOTE
  '::' or ':::' import not declared from: ‘htmltools’

devtools::test() returned three WARNings that should be addressed:


Warning (test-test_misc.R:12:3): All MCD In State Query
Could not find data for queries:

https://www.huduser.gov/hudapi/public/chas/listMCDs/6

It is possible that your key maybe invalid or there isn't any data for these parameters, If you think this is wrong please report it at https://github.com/etam4260/rhud/issues. Backtrace:

testthat::expect_error(...) at test-test_misc.R:12:2
rhud::hud_state_minor_civil_divisions("CA", "qdqdwq")
rhud::misc_do_query_call(urls, key) at rhud/R/hudmisc.R:491:2

Warning (test-test_misc.R:26:3): All Cities in State Query Could not find data for queries:

https://www.huduser.gov/hudapi/public/chas/listCities/6

It is possible that your key maybe invalid or there isn't any data for these parameters, If you think this is wrong please report it at https://github.com/etam4260/rhud/issues. Backtrace:

testthat::expect_error(hud_state_places("CA", "qdqdwq")) at test-test_misc.R:26:2
rhud::hud_state_places("CA", "qdqdwq")
rhud::misc_do_query_call(urls, key) at rhud/R/hudmisc.R:397:2

Warning (test-test_misc.R:47:3): List Counties Query Could not find data for queries:

https://www.huduser.gov/hudapi/public/fmr/listCounties/CA

It is possible that your key maybe invalid or there isn't any data for these parameters, If you think this is wrong please report it at https://github.com/etam4260/rhud/issues. Backtrace:

testthat::expect_error(hud_state_counties("CA", "qdqdwq")) at test-test_misc.R:47:2
rhud::hud_state_counties("CA", "qdqdwq")
rhud::misc_do_query_call(urls, key) at rhud/R/hudmisc.R:298:2

NIT - the filenames under R/ were very hard to follow. I would suggest names like hud_do_query_calls.R instead of huddoquerycalls.R. Consider taking a look at a style guide like the one from the tidyverse.
Here is my sessionInfo() in case it is useful:

R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Pop!_OS 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] rhud_0.2.0.9000

loaded via a namespace (and not attached):
[1] httr_1.4.3     compiler_4.2.0 R6_2.5.1       tools_4.2.0    curl_4.3.2     bspm_0.3.9

rtaph commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).
- I have no prior relationship with any of the package authors.

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
- The statement is there but I think "the problem it solves" could be made clearer. What I found a bit hard to immediately undestand was how rhud solves the stipulated problem.
[x] Installation instructions: for the development version of package and any non-standard dependencies in README.
- I was able to install rhud without problems using the instructions on the README (however, see below about building it with vignettes).
[x] Vignette(s): demonstrating major functionality that runs successfully locally
- These could be improved on a bit by commenting on the output from examples more. For an unitiated user, the context and applications would be helpful.
[x] Function Documentation: for all exported functions
[ ] Examples: (that run successfully locally) for all exported functions
- Some examples in the docs appear broken for internal functions. devtools::run_examples(run_dontrun = TRUE) does not work for all examples. See below.
[ ] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
- Some sections of your CONTRIBUTING file appear unfinished.
- I think you still need a Zenodo DOI in your CITATION file.

Functionality

[ ] Installation: Installation succeeds as documented.
- I am able to run devtools::load_all() successfully, but devtools::check() fails for me in the vignette-building stage. See my comment further down about pre-building your vignettes.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
- There are some small issues detailed below.

Estimated hours spent reviewing: 8

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Good job! I enjoyed going through your package and learning more about this API. On the whole, the package seems useful and I am glad to see that it will make it easier to interact the the HUD API. Thank you for inviting me to offer my review and critique.

From the version I reviewed (84c8d), there are areas I suggest improving prior to publication with ROpenSci. This includes opportunities to:

tidy-up code by removing unused functions, addressing straggler #TODOs, dropping functions with empty bodies, etc.
refactor blocks of code that are frequently repeated, that can be simplified, or that can be made more human-readable.
improve documentation to make it easier for a newcommer to grasp the nature of rhud regardless of their entrypoint into the documentation.

Below is a detailed list of suggestions-- some of which you may already have addressed or thought about. It is a lot to go through, so feel free to disagree and take only what you find valuable. If there are elements that you agree with but which are ambitious, some can be tasked for future development.

Docs and Communication

It took me some time to understand what what crosswalk meant. The word is used as a noun, verb, and adjective, and it was not until reading the full linked journal article that I think I think I understood it. It might be helpful to explain this concept to readers.
The ROpenSci package guidance says

The README, the top-level package docs, vignettes, websites, etc., should all have enough information at the beginning to get a high-level overview of the package and the services/data it connects to, and provide navigation to other relevant pieces of documentation.

I don't think all vignettes yet provide an easy entry point. I would encourage adding to their text to make it clearer.
In the README, it might be helpful to reference R packages that achieve something similar by alternate means. I'm thinking of GIS analogues like tongfen and areal.
There are some un-introduced abbreviations such as "PD&R". I suggest reviewing all the docs to check for other naked abbreviations.
In general I found the examples from the vignettes a bit hard to follow. There is an opportunity to improve them by adding some commentary on the output. This is probably not needed for all flavours of cross-walk, but commenting on the first example in your vignette could really help.
In vignettes, I would place lowercase abbreviations either in codeblocks (e.g., cbsa) or in uppercase (e.g., "CBSA")
I would encourage organizing your pkgdown reference page into sections in the _pkgdown.yml to make it easier to navigate. Perhaps grouping them by API?
I would use @title roxygen2 tage to give a descriptive title rather than duplicating the function name.
I would reorganize your README to put the most important information up top, and other information like citation and disclaimers lower down.
As per the ROpenSci packaging guidance, I would encourage the use of the @family roxygen tag to simplify @seealso sections.
I think cw_do_query_calls could be better documented in terms of what the function is meant to achieve. I had to look at the actual code to understand it, as the @description was not specific enough for me to understand.
"query: character double integer or numeric": This mixes data types with data classes. Can we remove double here?
Although not required, it would be useful to have an @example for internal functions. This would be helpful for reviewers or future code maintainers even if you have @noRd.
In your @param roxygen blocks, I would encourage describing whether certain parameters can accept vectors or scalar values only. Something like @param state A character vector indicating the state to ...
I would also review all your documentation to ensure that the expected class is clearly mentioned in the @param. For instance, in hud_chas it is not clear (unless you look at the example) if the input needs to be 1, "1", "National", or "1 - National".
Setup.Rmd:
- Instead of hud_{1}_{2}_{3}, I would use mnemonic names. For example, hud_{dataset}_{geo}_{resolution} or hud_{api}_{from}_{to}.
This blog encourages shipping API-wrapping packages with precomputed vignettes. That might be something to consider to allow the package to build faster and without an internet connection and/or API key. I suspect this is also the issue preventing devtools::check() to pass on my machine.
I would suggest not having Sys.setenv("HUD_KEY" = "q3r2rjimd129fj121jid") in examples because a) it is not a real key; and b) it will cause failures even if a user has a key because running this line overwrites their own environment variable (I did when I ran devtools::run_examples(run_dontrun = TRUE)). If you instead instruct users how to set their HUD_KEY persistently iff it is missing, they should be able to run the examples.

Tests

I would add a message to skip_if(). For example, to know that tests are being skipped because the HUD_KEY is missing. Checking for the HUD_KEY is so common that I think it warrants a wrapper skip_if_no_hud_key().
I would review the name of the unit tests to make them conform to an imperative tense. For example, "test awkward crosswalk" could be changed to "an improper crosswalk request must error" so that it plays nice with the phrase "test that ..."
It would be helpful to be able to quiet the printing of the status bar, especially for unit tests. Consider creating an rhud.quiet option to silence these messages. This would be particularly handy in unit tests. What I have in mind is something akin to the usethis.quiet option in the usethis package.
From the ROpenSci packaging guidance:

testthat has a function skip_on_cran() that you can use to not run tests on CRAN. We recommend using this on all functions that are API calls since they are quite likely to fail on CRAN. These tests should still run on continuous integration.
I see that you have an empty test with_mock_dir(), so I suspect you are dabbling with webmocking and may already know about this resource by Scott Chamberlain and Maëlle Salmon. The book has fantastic guidance on how run some unit tests through moking. For example, vcr might be a great solution for testing the API without an internet connection, as it will snapshot real responses as test fixtures. If you feel even more adventurous, you could create a Github Workflow that tests live requests on a schedule (e.g. semi-monthly) using a Github secret to store an API key.
Two tests fail rather than skip when the API key is missing.

Code

hud_set_key():
- In vignettes/Setup.Rmd, it could be useful to encourage users to set their environment key via usethis::edit_r_environ(). This would persist across sessions. Working with the renviron file seems cleaner than all the regex and string manipulation in hud_set_key(), though the key would be exposed to any package in R. In the medium term, it would be nice to also offer the option to use a keyring.
cw_input_check_cleansing:
- What about returning a named list instead of an unnamed list? This might allow you to just use something like args$zip and this would be less prone to breaking in future releases. I would also be more human-readable.
hud_get_key():
- I suggest raising an informative error if the variable is not set.
.onAttach():
- From the ROpenSci guidance:
  
  Only use package startup messages when necessary (function masking for instance). Avoid package startup messages like “This is foobar 2.4-0” or citation guidance because they can be annoying to the user. Rely on documentation for such guidance.
  
  I would only notify the user on startup if their HUD key is missing, and take away all the messaging about features. If the key is found on package startup, I would provide no message at all.
check_is_not_list():
- AFAIK, "numeric" is not an R type but rather a class, so I think typeof(dataframe[, i]) != "numeric" is not necessary. You could express the block in question more succinctly as
```
types <- c("character", "integer", "double")
typeof(dataframe) == "list" && all(apply(dataframe, 2, typeof) %in% types)
```
- I suggest renaming this function to is_valid_rhud_df(). It would make it more human-readable in unit tests.
numbers_only() might more aptly be named digits_only, since it will return FALSE for inputs such as 1.6 and 1.0.
You have a lot of repetition in the form of if (to_tibble == FALSE) {. I recommend writing a helper function format_df(x, to_tibble = to_tibble)) that you can call instead of needing the control flow (it would just return a dataframe if to_tibble = FALSE). This would make it more readable and less duplicative, and you might be able to logically fit the as.data.frame(do.call("rbind", list_res)) functionality there too.
hud_state_metropolitan:
- I find this block difficult to follow. Any way you can improve the readability?
- Here you can just iterate over state directly, instead of indexing with i
add_leading_zeros():
- The roxygen2 examples mistakenly still have hud_*(). Same goes for remove_leading_zeros(). Running devtools::run_examples(run_dontrun = TRUE) can help identify cases like these.
- I would be explicit in the @returns that this will cast input to a character vector. Same goes for remove_leading_zeros().
- This function can be refactored for greater readability. I would suggest something along the lines of:
```
assert_geoid_type(geoid_type) # <- let this raise the error
num_char <- switch(
geoid_type,
"countysub" =10,
"cd" = 4,
"tract" = 11,
5
)
fix_geoid(input, num_char = num_char)
```
fmr_il_input_check_cleansing():
- I believe you don't need stop(paste(..., sep = "")) and can directly do stop(...). Applies to other areas in the package too.
capitalize():
- This function is always called with the arguments tolower(state). It would be more direct to refactor this with a function extract_state() that does both steps in one go, and get rid of capitalize().
crosswalk_a_dataset_input_check_cleansing():
- I think it is safer to return a named list and not rely on positional indices.
- This function might benefit from renaming. I would suggest something along the lines of cw_data_validate().
fix_geoid():
- The function says that the first argument can be a column. IMO it would be cleaners if it only accepted vectors.
- I suggest renaming this to something like left_pad_zeros() (as it is actually geoid-agnostic) and rename add_leading_zeros() to some name with geoid in it as it is specific to geoids.
- I think this function can be refactored using functional programming principles. Alternatively, if you're willing to accept an import, you could use stringi::stri_pad_left(geoids, num_char, 0).
misc_do_query_call():
- You repeat add_headers() quite a few times, in other places, so I think a helper function to do this step would make sense.
chas_do_query_calls():
- You only use %notin% infix once, so it might not be worth the trouble of making a function for it, and instead go for the vanilla !x %in% y.
- I would suggest pulling all_measurements into a function that simply returns this vector, or use internal data to store a dictionary as a list.
- I don't think the lone message("\n") is ideal. Can you tidy this up?
rhud_website():
- This function could be DRY'd up a bit by simply using a switch() on the "start"/"open" piece.

Other code:

Not all calls to stop() have call. = FALSE set. Is this intentional?
This block gets repeated a lot:
```
if (!curl::has_internet()) stop("\nYou currently do not have internet access.", call. = FALSE)
```
How about writing a helper assert_internet() to reduce duplication?
You could write some helper functions to define the API host instead of repeating it: e.g., hud_host <- function() "https://www.huduser.gov"
I don't think library(rhud) is necessary at the top of all the examples. Of course, it does not hurt to have it there, but I think most R developers omit this.
httr::content() calls: Do we know if the APIs will always return UTF-8 content? If so, it would be good to be explicit.
error_urls <- c() might be better as error_urls <- NULL or error_urls <- character(0)

instead of using paste() to build query strings, I would use GET's other named arguments and/or modify_url(). Something like this would be easier to follow and maintain:

config <- add_headers(Authorization = paste("Bearer ", as.character(key)))
resp <- GET(
    url = "https://www.huduser.gov/hudapi/public/il",
    config =  config,
    path = ifelse(querytype == "state", "statedata", "data"),
    query = list(year = all_queries$year[i]),
    user_agent("https://github.com/etam4260/rhud"),
    timeout(30)
    )

I would encourage the use of tryCatch() instead of try().
GET():
- I would create a helper function to process the status httr reponses and handle different status codes, instead of try(content(call), silent = TRUE) and instead of if ("error" %in% names(cont)) {.
- You can also use httr::warn_for_status() and httr::stop_for_status().
Instead of running timeout(30) everywhere, I suspect it might be possible to use rely on httr options for this. You may also be interested in httr::RETRY() and this blog post.
It would be cleaner to set to_tibble = getOption('rhud_use_tibbles', FALSE) in the function's signature instead of having all your missing() checks in the function body.
download_bar():
- Have you considered the progress package instead of writing this functionality from scratch? It would add a light dependency, but since there are still some TODOs in your existing function, it might be worth it.
Some functions are defined but never used:
- decimal_num()
- is.negative()
- fivenumbers() (also, this function does the opposite of what I expected. Would rename it to not_*() if you keep it)
- fivenumsthenfournums()
- elevennumbers()
- fournumbers()
- tennumbers()
- hud_get_key(): Used in examples/vignettes but not in other functions. Probably should be used wherever you have Sys.getenv("HUD_KEY").
add_delimiters(): Is an empty function.
Some functions have commented code that probably should be addressed/removed before finalization (e.g., setup.R)
There are some remaining style issues:
- Usage of <- instead of =
- Usage of vapply instead of sapply()
- Reducing line length to have fewer than 80 chars per line where practical.
- Using the pattern if (!to_tibble) instead of if (to_tibble == FALSE) (and analogously for var == TRUE expressions).
- The tidyverse style is to not use return() explicitly at the end of a function. I see a few places that do not follow that convention.

Miscellaneous

I recommend adding the spelling package to you check routine by running spelling::spell_check_setup().
For future development/maintenance, you may be interested in fledge. By using conventional commits, you can facilitate creation of new releases and changelogs. The package is still under development but I already use it for my own package development.
You mention in your startup message that you are exploring caching options, yes? Have you considered using Neal Richardson httpcache package to cache httr's requests? I have never used it but I think it could be a valuable addition. I tried it by replacing @import httr with
```
#' @import httpcache
#' @importFrom httr content add_headers timeout user_agent
```
Once you make httr imports explicit, it seems to work well.
An idea might be to write a rhud_sitrep() situation report function to print things such as:
- if the API key is set (and recommend what do do if not),
- if the API is reachable
- (I could not find a health-check endpoint so maybe a lightweight API call would be needed?)
- check remaining API bandwith (if API has such an endpoint),
- display cache settings (whether or not enabled, and print the cache policy).
See usethis::git_sitrep(), usethis::proj_sitrep(), cli::cli_sitrep(), and devtools::dev_sitrep() for examples.
A future release could leverage usage of secrets/keyrings to store the API key. See https://httr.r-lib.org/articles/secrets.html

jhollist commented 2 years ago

Great to see all this progress on the review! And apologies for not responding sooner I was out of town for a while and brought back COVID as a souvenir. I am much better now and trying to dig myself out.

First, @rtaph and @khueyama thank you for two excellent reviews! Volunteer reviewers like you keep rOpenSci humming along!

Second, @etam4260 lot's to consider in the reviews. Take a close look and start working on your changes and response.

Regarding your question on CRAN submission (again please accept my apologies for the delay). I think a lot depends on how long you think it will take to get the revisions made. If only a couple of weeks, that you are probably good waiting until after revisions are done. If it will be longer than that, you can submit to CRAN now and then after revisions are completed and the package is accepted you can submit an updated version at that time. Biggest consideration is time between CRAN submissions. Best not to submit to frequently.

Any questions for me or the reviewers, put those questions in this issue.

jhollist commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/524#issuecomment-1143772151 time 4

ropensci-review-bot commented 2 years ago

Logged review for khueyama (hours: 4)

jhollist commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/524#issuecomment-1148226693 time 8

ropensci-review-bot commented 2 years ago

Logged review for rtaph (hours: 8)

etam4260 commented 2 years ago

Hi @khueyama @rtaph

Thanks for all the feedback! Definitely a lot to go through and digest. I am hoping to talk to rhud package contributors to get some thoughts. I will try to get back in a few days.

etam4260 commented 2 years ago

Hi @khueyama

1) "khueyama: I did not see contribution guidelines in the README or CONTRIBUTING. There is a .github/CONTRIBUTING.md file, but I believe it is auto-generated from a devtools function."

For your comment about the contributing file, this was not autogenerated from roxygen. I actually cookie cuttered it out of a more popular package, likely ggplot. I'm not exactly sure what else you would need in the CONTRIBUTING.md file. There are also issue templates for bugs and new features.

2) "khueyama: I think you should include some small tests of the functionality under R/hudmischelpers.R and R/hudinputcheckcleansing.R and other helper functions."

I have not worked on making tests for the internal functions yet -- will put that into my backlog. I must have confused

"Focus on testing the external interface to your functions - if you test the internal interface, then it’s harder to change the implementation in the future because as well as modifying the code, you’ll also need to update all the tests."

mentioned in the testing section of R packages: https://r-pkgs.org/tests.html#what-to-test to suggest avoiding testing internal functions. (Though in this case I'm not sure how I would even test the internal interface to my functions...). Are functions inside functions considered internal though...

3) "khueyama: One note is that you could include a brief demonstration of usage in the README, even something as simple as just rhud::hud_chas_nation()"

I added code blocks for helping the user set the key, how to type a simple example, and also options for tibbles vs data frames in the README.md.

4) "Some of the documentation could be clearer on the allowed inputs. For example, hud_chas_state(c("va", "wv"), year = c("2013-2017", "2014-2018")) will give you multiple years for multiple states, but I only figured that out from looking at the source code and playing around with things."

I added documentation in the '@return' roxygen for most functions to mention that it will query a combination of geoid, year ...

5) "I had issues with hud_set_key(). Running hud_set_key(key, in_wkdir = FALSE, in_home = TRUE) added the key to my working directory .Rprofile file instead; cat $HOME/.Rprofile"

I think I fixed this bug. Now it should add a new line before and after adding Sys.setenv() to the .Rprofile as well as making sure to append to the home directory .Rprofile instead of that in the working directory.

6) "It may be worth reaching out to the author of hudr to see if there is potential for collaboration? That may be preferable than having both rhud and hudr when there seems to be a lot of overlap in functionality."

I mentioned this to my advisor and since most my work completely overlaps the hudr work, including more, there likely won't be much to gain in terms of merging them together.

7) "I noticed a handful of instances where an empty vector is initialized and then appended to over a for-loop iteration. I don't know if the scale here gets large enough to cause noticeable performance issues, but this is something you generally want to avoid if possible. See this section of the Advanced R book for more: https://adv-r.hadley.nz/perf-improve.html#avoid-copies."

I have the package byte compiled when the user installs it. With my understanding of compilers, they tend to optimize away inefficiencies common in for loops. In this case, I'm thinking appending items to vectors isn't too much of a problem because of this, but I have yet to benchmark anything to confirm it. I do see the concerns with readability -- will try to think of a non-for loop implementation.

8) "Will the auxiliary functions in R/hudsplitgeoid.R be used? Look like empty placeholders at the moment. I would recommend either building them out or removing before CRAN submission."

I have removed this file as well as some other functions in other files that were not being used.

9) "My overall impression is that there is a lot of duplication of logic throughout the package. For instance, the functions under R/hudcw.R seem to be nearly identical, with only a couple of inputs changing. There's an opportunity here to build a layer of abstraction with a function handling the core API. And then functions like hud_cw_zip_cbsadiv and hud_cw_zip_county would call the core function, with different inputs for the geoids etc."

I agree with you. My initial approach was that I didn't want to abstract too many things away, just in case I didn't implement them in a way which would make it effectively reusable. I think this has to do with the lack of foresight to know what pieces of code I might reuse. I will definitely work on this.

10) "There's a lot of places where you're setting the user-agent to be the package github url. You probably want a helper function for users to set their own user-agent. Or else a single user's abuse of the system may get the entire package banned. And you probably don't want to be fielding questions from HUD IT re a user you probably don't know."

I 'partially' fixed this issue. I now have an interface for the user to set a user-agent. However, I defaulted it to the github repo. I think if a user wanted to use it for malicious means, then they could easily set the user agent to any other url to try and disguise themselves as someone or something else. I am hoping that using the github url might allow HUD IT to quickly pinpoint the problem and mitigate it. If they do ban the specific user agent, then a 'good user' could just set it to a different one.

11) devtools::check() returned three NOTES that should be addressed:

I think these should be fixed now. Although, now I am getting warnings about size capacity from check. I think I might need to move the pkgdown website to a different repository.

12) devtools::test() returned three WARNings that should be addressed:

I think there were just some missing assert warning in some of the tests. These should now be added in.

13) NIT - the filenames under R/ were very hard to follow. I would suggest names like hud_do_query_calls.R instead of huddoquerycalls.R. Consider taking a look at a style guide like the one from the tidyverse.

Agree. File names in R/ now include the underscore between words. I haven't read through the tidyverse style guide, but looks interesting: will take some time to go through it. Are there naming conventions for vignettes and test files?

etam4260 commented 2 years ago

Hi @rtaph

I've begun working on some of the feedback you made. Here is where I am so far...

1) "Some examples in the docs appear broken for internal functions. devtools::run_examples(run_dontrun = TRUE) does not work for all examples. See below."

This should be fixed. However there is an issue with cbsadiv queries for the USPS Crosswalk files. I think this might be a bug on HUD USER's end.

2) """Some sections of your CONTRIBUTING file appear unfinished. I think you still need a Zenodo DOI in your CITATION file."""

I'm not exactly sure what else needs to go in the CONTRIBUTING file. The Zenodo badge is now added to the readme and the CITATION file should now have the Zenodo bibentry.

3) """I am able to run devtools::load_all() successfully, but devtools::check() fails for me in the vignette-building stage. See my comment further down about pre-building your vignettes."""

I am currently working on this.

4) """It took me some time to understand what what crosswalk meant. The word is used as a noun, verb, and adjective, and it was not until reading the full linked journal article that I think I think I understood it. It might be helpful to explain this concept to readers.

The ROpenSci package guidance says

The README, the top-level package docs, vignettes, websites, etc., should all have enough information at the beginning to get a high-level overview of the package and the services/data it connects to, and provide navigation to other relevant pieces of documentation.

I don't think all vignettes yet provide an easy entry point. I would encourage adding to their text to make it clearer.

In general I found the examples from the vignettes a bit hard to follow. There is an opportunity to improve them by adding some commentary on the output. This is probably not needed for all flavours of cross-walk, but commenting on the first example in your vignette could really help. """

I have someone else working on this: a geography student who will definitely explain it better than me. As for the vignettes, I'm not sure what else would make the vignettes clearer as I mention the data source and the reason why the data source exists. I then provide examples for how to get the data. As for adding commentary, is explaining the columns in each dataset in the context of an output make sense for each vignette?

5) "In vignettes, I would place lowercase abbreviations either in codeblocks (e.g., cbsa) or in uppercase (e.g., "CBSA")"

Agree -- though I have this in the backlog as I feel like this is very minor usability issue.

6) "I would encourage organizing your pkgdown reference page into sections in the _pkgdown.yml to make it easier to navigate. Perhaps grouping them by API?"

I attempted to do this. However, I did run into some trouble getting pkgdown to parse the pkgdown.yml file correctly. So I might attempt this again some other time.

7) "I would use @title roxygen2 tage to give a descriptive title rather than duplicating the function name."

Working a bit on this. I will opt for simple titles such as:

Crosswalk Zip to Tract

8) "I would reorganize your README to put the most important information up top, and other information like citation and disclaimers lower down."

I agree. I did some major reorganizing in the README: still need to move the citation information down to the bottom.

9) "As per the ROpenSci packaging guidance, I would encourage the use of the @family roxygen tag to simplify @seealso sections."

I actually attempted to use the @family, but the way they organized the family functions in the documentation was very "mushed together". I think the '@seealso' allowed me to customize how each member was organized in relation to another.

10) """I think cw_do_query_calls could be better documented in terms of what the function is meant to achieve. I had to look at the actual code to understand it, as the @description was not specific enough for me to understand.

Although not required, it would be useful to have an @example for internal functions. This would be helpful for reviewers or future code maintainers even if you have @noRd. """

Agree. I think this could be generalized to all internal functions I have in the package. I spent most of my time on ensuring the quality of the exported functions documentation and left the internals docs for later.

11) "query: character double integer or numeric": This mixes data types with data classes. Can we remove double here?"

This should no longer be relevant. I don't mention data types or classes in the vignette parameters section anymore.

12) "In your @param roxygen blocks, I would encourage describing whether certain parameters can accept vectors or scalar values only. Something like @param state A character vector indicating the state to ..."

I wished R had some type hinting like that available in python: definitely would make it easier to document that. As for adding type hints to the '@param' I agree. In (most if not all) I have defaulted to accepting vectors only.

13) "I would also review all your documentation to ensure that the expected class is clearly mentioned in the @param. For instance, in hud_chas it is not clear (unless you look at the example) if the input needs to be 1, "1", "National", or "1 - National"."

Agree. Haven't made the changes yet, but will put that in my backlog.

14) "Instead of hud{1}{2}{3}, I would use mnemonic names. For example, hud{dataset}{geo}{resolution} or hud{api}{from}_{to}."

Agree. I have someone else working on this.

15) "I would suggest not having Sys.setenv("HUD_KEY" = "q3r2rjimd129fj121jid") in examples because a) it is not a real key; and b) it will cause failures even if a user has a key because running this line overwrites their own environment variable (I did when I ran devtools::run_examples(run_dontrun = TRUE)). If you instead instruct users how to set their HUD_KEY persistently iff it is missing, they should be able to run the examples."

I have removed all instances of a fake key in the function examples.

16) """I would add a message to skip_if(). For example, to know that tests are being skipped because the HUD_KEY is missing. Checking for the HUD_KEY is so common that I think it warrants a wrapper skip_if_no_hud_key().

I would review the name of the unit tests to make them conform to an imperative tense. For example, "test awkward crosswalk" could be changed to "an improper crosswalk request must error" so that it plays nice with the phrase "test that ...""" "

Agree : these are in the backlog.

17) "It would be helpful to be able to quiet the printing of the status bar, especially for unit tests. Consider creating an rhud.quiet option to silence these messages. This would be particularly handy in unit tests. What I have in mind is something akin to the usethis.quiet option in the usethis package."

Agree: this is in the backlog.

18) """ I see that you have an empty test with_mock_dir(), so I suspect you are dabbling with webmocking and may already know about this resource by Scott Chamberlain and Maëlle Salmon. The book has fantastic guidance on how run some unit tests through moking. For example, vcr might be a great solution for testing the API without an internet connection, as it will snapshot real responses as test fixtures. If you feel even more adventurous, you could create a Github Workflow that tests live requests on a schedule (e.g. semi-monthly) using a Github secret to store an API key.

Two tests fail rather than skip when the API key is missing. """

The test failing when API key is fixed locally. However, as of right now, I have not pushed the changes to the repo. As for web mocking and scheduled workflows, this is something I have far back in the backlog. Didn't want to spend too much time on web-mocking if I didn't know if my implementation was working.

These are my thoughts so far. As for the: (Code, Other Code, and Miscellaneous) sections mentioned in your review, I will put that in another comment as this one seems to be getting very long.

And thanks again for such a comprehensive review. Definitely a lot to learn about!

rtaph commented 2 years ago

Hi @etam4260

Great to see so much activity!

I'm not exactly sure what else needs to go in the CONTRIBUTING file.

IMO the elements you have in the CONTRIBUTING file are sufficient. I think I just saw "This outlines how to propose changes to rhud..." and thought the ellipses indicated this might be an unfinished sentence.

I'm not sure what else would make the vignettes clearer

I have two suggestions:

Adding a few sentences up top to explain that rhud solves the task of converting data associated to ZIP codes to other well-known geographic regions such as counties.
Printing the output from the examples. At the moment, some do, and some don't.

jhollist commented 2 years ago

All, sorry for the long delay in checking in on this.

@etam4260 have you had a chance to look at the last few comments from @rtaph?

When you have, let me know. At that point, @rtaph and @khueyama, I will ask for for you to take a close look at the revisions and get your agreement that the package has been revised to your satisfaction.

Hope you all are having a good summer (and are staying cool!)

etam4260 commented 2 years ago

Hi @jhollist, I am still currently working on the changes -- mostly DRYing it up. @rtaph as for the feedback that you mentioned below Other Code/Miscellaneous. I agree with most of it, at ~ maybe 75% done with those changes.

etam4260 commented 2 years ago

Hi, @rtaph @jhollist @khueyama as a follow up, I think most of internal changes/DRYing is done (there is definitely more DRYING that can be done, such as the CORE api mentioned by @khueyama, but have not gotten around to implementing it), here are the few items left:

This blog encourages shipping API-wrapping packages with precomputed vignettes. That might be something to consider to allow the package to build faster and without an internet connection and/or API key.
I see that you have an empty test with_mock_dir(), so I suspect you are dabbling with webmocking and may already know about this resource by Scott Chamberlain and Maëlle Salmon. The book has fantastic guidance on how run some unit tests through moking. For example, vcr might be a great solution for testing the API without an internet connection, as it will snapshot real responses as test fixtures. If you feel even more adventurous, you could create a Github Workflow that tests live requests on a schedule (e.g. semi-monthly) using a Github secret to store an API key
In vignettes/Setup.Rmd, it could be useful to encourage users to set their environment key via usethis::edit_r_environ(). This would persist across sessions. Working with the renviron file seems cleaner than all the regex and string manipulation in hud_set_key(), though the key would be exposed to any package in R. In the medium term, it would be nice to also offer the option to use a keyring.
Instead of using paste() to build query strings, I would use GET's other named arguments and/or modify_url(). Something like this would be easier to follow and maintain:

Furthermore, I know the use of RETRY was mentioned in one of the reviews. This is something getting worked on, but I am assuming this should include a getOption for the number of retries that a user would want.

@rtaph as for your suggestion on printing the output from the example, do you mean using print() to show in console?

Also do you recommend separating the package from the website documentation as it looks like it is throwing NOTEs related to the /doc folder size? Otherwise are there any other issues?

ropensci / software-review