onmaRg package for the Ontario Marginalization Index

WiIIson commented 1 month ago

Submitting Author Name: D. William Conley Submitting Author Github Handle: !--author1-->@Willson<!--end-author1--\ Repository: https://github.com/WiIIson/onmaRg Submission type: Pre-submission Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: onmaRg
Type: Package
Title: Import Public Health Ontario's Ontario Marginalization Index
Version: 1.0.3
Authors@R:
    person("William", "Conley", , "william@cconley.ca", role = c("aut", "cre"))
Description: The Ontario Marginalization Index is a socioeconomic model that is built on Statistics Canada census data.
    The model consists of four dimensions: In 2021, these dimensions were updated to "Material Resources" (previously called "Material Deprivation"), "Households and Dwellings" (previously called "Residential Instability"), "Age and Labour Force" (previously called "Dependency"), and "Racialized and Newcomer Populations" (previously called "Ethnic Concentration").
    This update reflects a movement away from deficit-based language. 2021 data will load with these new dimension names, wheras 2011 and 2016 data will load with the historical dimension names.
    Each of these dimensions are imported for a variety of geographic levels (DA, CD, etc.) for the 2021, 2011 and 2016 administrations of the census.
    These data sets contribute to community analysis of equity with respect to Ontario's Anti-Racism Act.
    The Ontario Marginalization Index data is retrieved from the Public Health Ontario website: <https://www.publichealthontario.ca/en/data-and-analysis/health-equity/ontario-marginalization-index>.
    The shapefile data is retrieved from the Statistics Canada website: <https://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/bound-limit-eng.cfm>.
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.1
Depends: dplyr, httr, readxl, sf, stringr, utils
Suggests: 
    knitr,
    rmarkdown
VignetteBuilder: knitr

Scope

Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check one or more appropriate boxes below):

Data Lifecycle Packages
- [x] data retrieval
- [ ] data extraction
- [x] data munging
- [ ] data deposition
- [ ] data validation and testing
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [x] geospatial data
- [ ] text analysis
  
  Statistical Packages
- [ ] Bayesian and Monte Carlo Routines
- [ ] Dimensionality Reduction, Clustering, and Unsupervised Learning
- [ ] Machine Learning
- [ ] Regression and Supervised Learning
- [ ] Exploratory Data Analysis (EDA) and Summary Statistics
- [ ] Spatial Analyses
- [ ] Time Series Analyses
- [ ] Probability Distributions
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:

The onmaRg package retrieves data from Public Health Ontario's Ontario Marginalization Index, and joins it with Stats Canada's geospatial data to create a geographic map of OnMarg data.

If submitting a statistical package, have you already incorporated documentation of standards into your code via the srr package?

N/A.

Who is the target audience and what are scientific applications of this package?

The target audience of this package is social scientists and epidemiologists who are engaged in understanding community socioeconomic dynamics. This makes disparate datasets more accessible. There is currently a user group within education research that has been using this package.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

No.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

OnMarg is a data model developed by Public Health Ontario, using data from Statistics Canada.

Any other questions or issues we should be aware of?:

No.

adamhsparks commented 1 month ago

Hi @WiIIson, I've run some checks on {onmaRg} and note that you don't currently have any tests. We require at least 75% test coverage before packages can be reviewed.

{testthat} is the most commonly used package for this purpose, https://testthat.r-lib.org/reference/expect_no_error.html. I'm happy to help you set up testing if you would like me to assist.

adamhsparks commented 1 month ago

@ropensci-review-bot check package

ropensci-review-bot commented 1 month ago

Thanks, about to send the query.

ropensci-review-bot commented 1 month ago

:rocket:

The following problems were found in your submission template:

submission type must be one of [Standard, Estandar, Stats]
HTML variable [editor] is missing
HTML variable [reviewers-list] is missing
HTML variable [due-dates-list] is missing Editors: Please ensure these problems with the submission template are rectified. Package checks have been started regardless.

:wave:

ropensci-review-bot commented 1 month ago

Checks for [onmaRg (v1.0.3)]()

git hash: 745ccc18

:heavy_check_mark: Package is already on CRAN.
:heavy_multiplication_x: does not have a 'codemeta.json' file.
:heavy_multiplication_x: does not have a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_multiplication_x: 'DESCRIPTION' does not have a URL field.
:heavy_multiplication_x: 'DESCRIPTION' does not have a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_multiplication_x: Continuous integration checks unavailable (no URL in 'DESCRIPTION').
:heavy_multiplication_x: Package coverage is 0% (should be at least 75%).
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL-3

1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:---------|------:| |internal |base | 37| |internal |onmaRg | 16| |internal |utils | 5| |internal |stats | 1| |depends |dplyr | NA| |imports |NA | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

c (7), by (5), paste0 (5), stop (4), tempfile (4), for (3), merge (2), tempdir (2), url (2), format (1), length (1), seq (1)

onmaRg

process_2011_2016 (4), extractFromZip (3), om_data (3), getFileName (2), process_2021 (2), om_geo (1), om_quint (1)

utils

page (4), read.csv (1)

stats

quantile (1)

**NOTE:** No imported packages appear to have associated function calls; please ensure with author that these 'Imports' are listed appropriately.

2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 3 files) and - 1 authors - 1 vignette - no internal data file - 1 imported package - 3 exported functions (median 57 lines of code) - 9 non-exported functions in R (median 29 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 3| 18.8| | |files_vignettes | 2| 81.7| | |files_tests | 0| 0.0|TRUE | |loc_R | 261| 28.0| | |loc_vignettes | 41| 6.2| | |num_vignettes | 1| 58.9| | |n_fns_r | 12| 17.0| | |n_fns_r_exported | 3| 14.5| | |n_fns_r_not_exported | 9| 20.7| | |n_fns_per_file_r | 2| 42.2| | |num_params_per_fn | 2| 8.2| | |loc_per_fn_r | 36| 82.9| | |loc_per_fn_r_exp | 57| 82.6| | |loc_per_fn_r_not_exp | 29| 77.4| | |rel_whitespace_R | 25| 38.7| | |rel_whitespace_vignettes | 24| 6.0| | |doclines_per_fn_exp | 24| 21.6| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 11| 33.6| | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

3. `goodpractice` and other checks

Details of goodpractice checks (click to open)

--- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following check_fails: 1. no_description_depends 2. description_url 3. description_bugreports 4. no_import_package_as_a_whole #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 0 The following files are not completely covered by tests: file | coverage --- | --- R/om_data.R | 0% R/om_geo.R | 0% R/om_quint.R | 0% #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found no issues with this package!

Package Versions

|package |version | |:--------|:--------| |pkgstats |0.1.6.19 | |pkgcheck |0.1.2.61 |

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

WiIIson commented 1 month ago

Hi, thanks for taking a look at my package. I had created tests for the package, but as all of the functions involve fetching the dataset from the website, all of the tests would time out and fail due to how large the dataset is. I would appreciate any suggestions you might have on how to approach the timing out.

adamhsparks commented 1 month ago

@WiIIson, I don't understand how the package works if you can't fetch the data due to timeouts? Why is this only experienced with tests and not during normal use? Perhaps if you're experiencing issues with this, I'd set the TIMEOUT to be longer, e.g., https://github.com/ropensci/weatherOz/blob/0c544acd4651fa16a3a3dc2e3e24addf5d41187d/R/get_radar_imagery.R#L49. I've got some experience with this issue unfortunately. 😆

I see from the bot's checks that you have a few other issues that should be addressed as well, such as the URLs in the DESCRIPTION a COC and codemeta.json file.

adamhsparks commented 1 month ago

I didn't catch until now that this was a pre-submission. Do you have any specific questions that I can address aside from the issues that we're identifying here already that would get it prepped for full submission?

WiIIson commented 1 month ago

I'd be up for getting the testing ready, I'm just wondering if you think this package would be a good fit for OpenSci.

The tests timing out is related to CRAN accepting the package, I believe they have a fixed amount of time the tests can run for even though the functions all work properly. If the tests are running locally then it won't be a problem.

adamhsparks commented 1 month ago

Ah. Ok!

Yes, this would fit with data retrieval, munging and geospatial as you indicated.

And have we got a book for you! https://books.ropensci.org/http-testing/

To be brief, there are packages that allow you to record the API’s response, e.g., {httptest2} or {vcr}, which make the tests much faster and you can also just use test that::skip_if_offline() and the tests are skipped on CRAN. See, https://github.com/ropensci/weatherOz/blob/0c544acd4651fa16a3a3dc2e3e24addf5d41187d/tests/testthat/test-get_dpird_summaries.R#L104 for using a {vcr} cassette and skipping offline.

We would suggest updating to {httr2} as well. You’ll have a more recent and better supported package to work with.

adamhsparks commented 3 weeks ago

@WiIIson, are you OK for me to close this issue now? Have we addressed your questions sufficiently?

WiIIson commented 3 weeks ago

Yeah sounds good, thanks!

ropensci / software-review