ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

rhud: A R interface for the US Department of Housing and Urban Development APIs #524

Open etam4260 opened 2 years ago

etam4260 commented 2 years ago

Submitting Author Name: Name Submitting Author Github Handle: !--author1-->@etam4260<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) Repository: https://github.com/etam4260/rhud Version submitted: Submission type: Standard Editor: !--editor-->@jhollist<!--end-editor-- Reviewers: @rtaph, @khueyama

Due date for @rtaph: 2022-06-07 Due date for @khueyama: 2022-06-09

Archive: TBD Version accepted: TBD Language: en

Package: hudr
Title: A R interface for accessing HUD (US Department of Housing and Urban Development) APIs
Version: 0.1.0.9000
Authors@R: 
    c(person("Emmet", "Tam", ,"emmet_tam@yahoo.com", role = c("aut", "cre", "cph")),
    person("Allison", "Reilly", ,"areilly2@umd.edu", role = c("ctb")),
    person("Hamed", "Ghaedi", ,"hghaedi@terpmail.umd.edu", role = c("ctb")))
Description: 
    An R interface for accessing HUD (US Department of Housing and Urban Development) API.
    The HUD has four main datasets, USPS Crosswalk, Fair Markets Rent,
    Income Limits, and the Comprehensive Housing Affordability Strategy.
License: GPL (>= 2)
Language: en-US
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2.9000
URL: https://github.com/etam4260/hudr, https://etam4260.github.io/hudr/index.html
BugReports: https://github.com/etam4260/hudr/issues
Suggests: 
    covr,
    httptest,
    knitr,
    rmarkdown,
    testthat (>= 3.0.0)
Imports: 
    httr,
    devtools,
    zoo,
    rio
Config/testthat/edition: 3
VignetteBuilder: knitr
ByteCompile: true

Scope

It is a data retrieval package because it retrieves data from an API. It 'will' be a data munging package after implementation of additional features such as cross walking an entire dataset. Furthermore, the APIs which this package retrieves data from are associated with geographic identifiers.

I am hoping to reach professors, researchers, and students with this package. This gives access to the crosswalk files which is a geospatial technique described very well in these journal articles:

Din, Alexander and Wilson, Ron, 2020. “Crosswalking ZIP Codes to Census Geographies: Geoprocessing the U.S. Department of Housing & Urban Development’s ZIP Code Crosswalk Files,” Cityscape: A Journal of Policy Development and Research, Volume 22, Number 1, https://www.huduser.gov/portal/periodicals/cityscpe/vol22num1/ch12.pdf

Wilson, Ron and Din, Alexander, 2018. “Understanding and Enhancing the U.S. Department of Housing and Urban Development’s ZIP Code Crosswalk Files,” Cityscape: A Journal of Policy Development and Research, Volume 20 Number 2, 277 – 294.

Additionally, it provides access to Income Limits and Fair Markets Rent as well as Comprehensive Housing and Affordability datasets provided by HUD which is of interest to housing and social science researchers.

Implementation of a crosswalk function is planned in future releases, which will help crosswalk a US dataset from one geographic identifier into another using the method described in the papers above.

Recently, a hudr package got published on CRAN, but it looks like some of derives from the work I currently have. I am not sure how this will affect my prospects of submitting this to CRAN. Furthermore, their package provides only access to the fair markets rent and income limits API provide by HUD. Mine gives access to all the APIs that are currently supported by HUD USER (https://www.huduser.gov/portal/home.html) as well as providing more flexibility and intuitiveness.

As for documentation and testing, I believe my package could be improved. I don't test very many edge cases and have not created vignettes for every function.

For the most part, I think yes. The package requires an API key which I have users store using Sys.setenv(). I have not looked into the more sophisticated methods like the keyring package and do not instruct the user on how to set the key to be persistent.

https://github.com/ropensci/software-review/issues/500 @jooolia

I seem to get some errors when running pkgcheck. I manually made sure I had all the necessary components. For the pkgcheck requirement that says all functions need examples, I am assuming that only includes exported ones?

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

rtaph commented 2 years ago

Thanks for the update, @etam4260 ! It is totally reasonable to leave some suggestions for future consideration.

I will try to carve out time this weekend to look at your latest changes.

In the meantime:

  1. Re: "do you mean using print() to show in console?" There are some vignettes where you call a function as an example but don't show the output. For instance, in the state-level fair market rents, we don't know what the result of hud_fmr(query = 'VA', year = '2021') is unless we run it ourselves in an R session. I am guessing maybe this was because the output is quite large to print? Even so, I think it would be very helpful to show some of the structure of the output. Something like

      tab <- rhud::hud_fmr(query = 'VA', year = '2021')
      head(tab[[1]])
      #> 
      #>   town_name      county_name
      #> 1      NULL  Accomack County
      #> 2      NULL Albemarle County
      #> 3      NULL  Alexandria city
      #> 4      NULL Alleghany County
      #> 5      NULL    Amelia County
      #> 6      NULL   Amherst County
      #>                                                                     metro_name
      #> 1                                                          Accomack County, VA
      #> 2                                       Charlottesville, VA HUD Metro FMR Area
      #> 3                 Washington-Arlington-Alexandria, DC-VA-MD HUD Metro FMR Area
      #> 4 Alleghany County-Clifton Forge city-Covington city, VA HUD Nonmetro FMR Area
      #> 5                                                             Richmond, VA MSA
      #> 6                                                            Lynchburg, VA MSA
      #>    fips_code Efficiency One-Bedroom Two-Bedroom Three-Bedroom Four-Bedroom
      #> 1 5100199999        481         602         713           947          967
      #> 2 5100399999        949        1077        1266          1575         1965
      #> 3 5151099999       1513        1548        1765          2263         2742
      #> 4 5100599999        495         558         735           958         1142
      #> 5 5100799999        993        1020        1163          1538         1840
      #> 6 5100999999        633         660         784          1053         1241
      #>   FMR Percentile statename statecode smallarea_status query year
      #> 1             40  Virginia        VA                0    VA 2021
      #> 2             40  Virginia        VA                0    VA 2021
      #> 3             40  Virginia        VA                1    VA 2021
      #> 4             40  Virginia        VA                0    VA 2021
      #> 5             40  Virginia        VA                0    VA 2021
      #> 6             40  Virginia        VA                0    VA 2021

    Created on 2022-08-24 with reprex v2.0.2

    or

      str(rhud::hud_fmr(query = 'VA', year = '2021'))
      #> 
      #> List of 2
      #>  $ counties  :'data.frame':  133 obs. of  15 variables:
      #>   ..$ town_name       : chr [1:133] "NULL" "NULL" "NULL" "NULL" ...
      #>   ..$ county_name     : chr [1:133] "Accomack County" "Albemarle County" "Alexandria city" "Alleghany County" ...
      #>   ..$ metro_name      : chr [1:133] "Accomack County, VA" "Charlottesville, VA HUD Metro FMR Area" "Washington-Arlington-Alexandria, DC-VA-MD HUD Metro FMR Area" "Alleghany County-Clifton Forge city-Covington city, VA HUD Nonmetro FMR Area" ...
      #>   ..$ fips_code       : chr [1:133] "5100199999" "5100399999" "5151099999" "5100599999" ...
      #>   ..$ Efficiency      : chr [1:133] "481" "949" "1513" "495" ...
      #>   ..$ One-Bedroom     : chr [1:133] "602" "1077" "1548" "558" ...
      #>   ..$ Two-Bedroom     : chr [1:133] "713" "1266" "1765" "735" ...
      #>   ..$ Three-Bedroom   : chr [1:133] "947" "1575" "2263" "958" ...
      #>   ..$ Four-Bedroom    : chr [1:133] "967" "1965" "2742" "1142" ...
      #>   ..$ FMR Percentile  : chr [1:133] "40" "40" "40" "40" ...
      #>   ..$ statename       : chr [1:133] "Virginia" "Virginia" "Virginia" "Virginia" ...
      #>   ..$ statecode       : chr [1:133] "VA" "VA" "VA" "VA" ...
      #>   ..$ smallarea_status: chr [1:133] "0" "0" "1" "0" ...
      #>   ..$ query           : chr [1:133] "VA" "VA" "VA" "VA" ...
      #>   ..$ year            : chr [1:133] "2021" "2021" "2021" "2021" ...
      #>  $ metroareas:'data.frame':  19 obs. of  13 variables:
      #>   ..$ metro_name      : chr [1:19] "Blacksburg-Christiansburg-Radford, VA HUD Metro FMR Area" "Buckingham County, VA HUD Metro FMR Area" "Charlottesville, VA HUD Metro FMR Area" "Culpeper County, VA HUD Metro FMR Area" ...
      #>   ..$ code            : chr [1:19] "METRO13980M13980" "METRO16820N51029" "METRO16820M16820" "METRO47900N51047" ...
      #>   ..$ Efficiency      : chr [1:19] "795" "564" "949" "788" ...
      #>   ..$ One-Bedroom     : chr [1:19] "858" "652" "1077" "794" ...
      #>   ..$ Two-Bedroom     : chr [1:19] "978" "743" "1266" "1046" ...
      #>   ..$ Three-Bedroom   : chr [1:19] "1400" "1007" "1575" "1439" ...
      #>   ..$ Four-Bedroom    : chr [1:19] "1693" "1140" "1965" "1811" ...
      #>   ..$ FMR Percentile  : chr [1:19] "40" "40" "40" "40" ...
      #>   ..$ statename       : chr [1:19] "Virginia" "Virginia" "Virginia" "Virginia" ...
      #>   ..$ statecode       : chr [1:19] "VA" "VA" "VA" "VA" ...
      #>   ..$ smallarea_status: chr [1:19] "0" "0" "0" "0" ...
      #>   ..$ query           : chr [1:19] "VA" "VA" "VA" "VA" ...
      #>   ..$ year            : chr [1:19] "2021" "2021" "2021" "2021" ...

    Created on 2022-08-24 with reprex v2.0.2

  2. Re: "do you recommend separating the package from the website documentation?" That is a great question. I don't know what the best practice is when the /docs throw a NOTE due to size. @jhollist , do you have an opinion?

mpadge commented 2 years ago

Re: "do you recommend separating the package from the website documentation?" That is a great question. I don't know what the best practice is when the /docs throw a NOTE due to size.

@rtaph You don't really need to worry, because as soon as the package is transferred the docs will be built by the internal rOpenSci doc server, and you can just remove the whole /docs folder anyway. See the Packaging chapter of Dev Guide for details.

jhollist commented 2 years ago

Sorry for delay in responding.  I am on vacation this week and next and mostly away from my computer so I might be a little slow in responding to this.  I will get to it as soon as I am able.

And thanks @mpadge for the assist!

etam4260 commented 2 years ago

Thanks for the info @mpadge and good to know. Do you know if this is the same case for CRAN?

@rtaph Additional info: I think my testing suite is still a little bit incomplete as of now. I need to go back and make sure I test exact outputs versus determining if I get some output. In my case, the API for USPS Crosswalk API changed, but my tests didn't catch that. Also not at that 75% test coverage mark just yet 😅

rtaph commented 2 years ago

Okay, no problem!

I'll hold off on reviewing the code then, until you let me know you think rhud is ready for another look.

jhollist commented 2 years ago

@etam4260 Just now getting back to this!

CRAN won't see your docs folder as you currently have it listed in your .Rbuildignore file. That is the way you want it. The documentation website will be independent of anything that happens on CRAN. I think I would follow @mpadge advice and remove the docs folder. rOpenSci will take care of building that for you.

At least that is what I can tell from digging around a bit. I haven't used pkdgown sites much myself so don't have a ton of personal experience here.

jhollist commented 2 years ago

@etam4260 just checking in to see how you are making out on the edits.

jhollist commented 1 year ago

@etam4260 Hoping to get this submission wrapped up. Please let me know where things stand. Thanks!

jhollist commented 1 year ago

@ropensci-review-bot put on hold

ropensci-review-bot commented 1 year ago

Submission on hold!

jhollist commented 1 year ago

@etam4260 Just giving a heads up that we have put this submission on hold. Will check in again in 3 months unless I hear from you before that.

ropensci-review-bot commented 1 year ago

@jhollist: Please review the holding status

jhollist commented 1 year ago

@etam4260 Any updates on rhud? Have you made progress on edits? Do you expect too in the next couple of months?