Open realbp opened 2 months ago
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help
for help.
:rocket:
Editor check started
:wave:
git hash: 36706151
Package License: MIT + file LICENSE
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
|type |package | ncalls|
|:----------|:----------|------:|
|internal |base | 232|
|internal |cancerprof | 66|
|internal |stats | 28|
|internal |graphics | 8|
|imports |magrittr | 122|
|imports |dplyr | 64|
|imports |cli | 55|
|imports |rlang | 25|
|imports |httr2 | 8|
|imports |utils | 2|
|imports |cdlTools | 1|
|imports |stringr | 1|
|suggests |knitr | NA|
|suggests |rmarkdown | NA|
|suggests |testthat | NA|
|linking_to |NA | NA|
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(
list (76), c (56), structure (28), body (20), class (12), url (9), options (8), paste0 (8), new.env (5), as.raw (4), date (4), which (2)
%>% (122)
create_request (20), demo_crowding (1), demo_education (1), demo_food (1), demo_income (1), demo_insurance (1), demo_language (1), demo_mobility (1), demo_population (1), demo_poverty (1), demo_svi (1), demo_workforce (1), dput_resp_demo (1), dput_resp_incd (1), dput_resp_mortality (1), dput_resp_risk (1), fips_scp (1), get_area (1), handle_age (1), handle_alcohol (1), handle_cancer (1), handle_crowding (1), handle_datatype (1), handle_diet_exercise (1), handle_education (1), handle_food (1), handle_income (1), handle_insurance (1), handle_mobility (1), handle_non_english (1), handle_population (1), handle_poverty (1), handle_race (1), handle_screening (1), handle_sex (1), handle_smoking (1), handle_stage (1), handle_svi (1), handle_vaccine (1), handle_women_health (1), handle_workforce (1), handle_year (1), incidence_cancer (1), mortality_cancer (1), process_resp (1), risk_alcohol (1), risk_colorectal_screening (1)
across (26), mutate (26), mutate_all (4), all_of (3), na_if (3), filter (2)
cli_abort (55)
setNames (28)
is_na (23), sym (2)
frame (8)
request (8)
data (1), read.csv (1)
fips (1)
str_pad (1)
base
magrittr
cancerprof
dplyr
cli
stats
rlang
graphics
httr2
utils
cdlTools
stringr
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in R (100% in 54 files) and - 1 authors - 4 vignettes - no internal data file - 8 imported packages - 19 exported functions (median 26 lines of code) - 83 non-exported functions in R (median 22 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 54| 96.4| | |files_vignettes | 4| 95.3| | |files_tests | 45| 99.0| | |loc_R | 2611| 88.6| | |loc_vignettes | 492| 77.7| | |loc_tests | 1533| 91.7| | |num_vignettes | 4| 96.6|TRUE | |n_fns_r | 102| 77.0| | |n_fns_r_exported | 19| 65.9| | |n_fns_r_not_exported | 83| 80.0| | |n_fns_per_file_r | 1| 0.2|TRUE | |num_params_per_fn | 4| 54.6| | |loc_per_fn_r | 22| 65.5| | |loc_per_fn_r_exp | 26| 57.4| | |loc_per_fn_r_not_exp | 22| 66.9| | |rel_whitespace_R | 9| 75.4| | |rel_whitespace_vignettes | 39| 83.5| | |rel_whitespace_tests | 12| 84.2| | |doclines_per_fn_exp | 56| 69.3| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 124| 82.6| | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/getwilds/cancerprof/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/getwilds/cancerprof/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 8546148980|pages build and deployment |success |13c6cf | 6|2024-04-03 | | 8546127063|pkgdown |success |367061 | 17|2024-04-03 | | 8546127064|R-CMD-check |success |367061 | 181|2024-04-03 | | 8546127060|test-coverage |success |367061 | 19|2024-04-03 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following check_fail: 1. cyclocomp #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 97.93 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- risk_smoking | 106 demo_population | 70 incidence_cancer | 34 mortality_cancer | 29 demo_insurance | 27 demo_poverty | 27 risk_colorectal_screening | 21 risk_women_health | 19 demo_education | 18 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 102 potential issues: message | number of times --- | --- Avoid using sapply, consider vapply instead, that's type safe | 24 Lines should not be more than 80 characters. | 78
|package |version | |:--------|:--------| |pkgstats |0.1.3.11 | |pkgcheck |0.1.2.21 |
This package is in top shape and may be passed on to a handling editor
@ropensci-review-bot assign @ldecicco-USGS as editor
Assigned! @ldecicco-USGS is now the editor
There could be more information added to the README, although the bare minimum to meet our criteria is there.
In the examples I tried, my first thought was it might be nice to convert some of the text. For instance:
x <- demo_income(
area = "usa",
areatype = "state",
income = "median family income",
race = "all races (includes hispanic)"
)
head(x$Rank)
[1] "52 of 52" "51 of 52" "50 of 52" "49 of 52" "48 of 52" "47 of 52"
Seems like c(52, 51, 50, etc) would be a more useful output to an R user. You'd probably want/need another column or something to give the user the " of 52". Not mandatory, could be handy though (maybe a simple function to offer users outside of the function? or a simple example within the examples for how to extract the rank number).
@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/637_status.svg)](https://github.com/ropensci/software-review/issues/637)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
Thank you for the feedback! I will make those changes in the upcoming version of cancerprof. I have added the ropensci badge and created a NEWS.md file.
What are the next steps in the review process?
I'm asking around to find 2 reviewers. Hopefully that shouldn't take too long!
Great, thank you for a speedy response!
@ropensci-review-bot assign @jromanowska as reviewer
@jromanowska added to the reviewers list. Review due date is 2024-05-20. Thanks @jromanowska for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@jromanowska: If you haven't done so, please fill this form for us to update our reviewers records.
Hi! Just for your information: I'll start with the review soon. There are many free days in May here, in Norway, but I hope I will not need any extension of the review deadline. :crossed_fingers:
@ldecicco-USGS , I just wanted to notify that {goodpractice} package that is dependency for {pkgcheck} has been archived by CRAN (https://cran.r-project.org/web//packages/goodpractice/index.html) so I couldn't install {pkgcheck} and had to install the GitHub version of {goodpractice} by hand.
Hi, I'm having problems installing the package:
pak::pak("getwilds/cancerprof")
#> Error: ! error in pak subprocess
#> Caused by error:
#> ! Could not solve package dependencies:
#> * getwilds/cancerprof: ! pkgdepends resolution error for getwilds/cancerprof.
#> Caused by error:
#> ! Bad GitHub credentials, make sure that your GitHub token is valid.
#> Caused by error in `stop(http_error(resp))`:
#> ! Unauthorized (HTTP 401).
devtools::install_github("getwilds/cancerprof")
#> Downloading GitHub repo getwilds/cancerprof@HEAD
#> Installing 3 packages: terra, raster, cdlTools
#> Installing packages into ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3’
#> (as ‘lib’ is unspecified)
#> trying URL 'https://cloud.r-project.org/src/contrib/terra_1.7-71.tar.gz'
#> Content type 'application/x-gzip' length 836573 bytes (816 KB)
#> ==================================================
#> downloaded 816 KB
#>
#> trying URL 'https://cloud.r-project.org/src/contrib/raster_3.6-26.tar.gz'
#> Content type 'application/x-gzip' length 576421 bytes (562 KB)
#> ==================================================
#> downloaded 562 KB
#>
#> trying URL 'https://cloud.r-project.org/src/contrib/cdlTools_1.13.tar.gz'
#> Content type 'application/x-gzip' length 43089 bytes (42 KB)
#> ==================================================
#> downloaded 42 KB
#>
#> * installing *source* package ‘terra’ ...
#> ** package ‘terra’ successfully unpacked and MD5 sums checked
#> ** using staged installation
#> configure: CC: gcc
#> configure: CXX: g++ -std=gnu++17
#> checking for gdal-config... no
#> no
#> configure: error: gdal-config not found or not executable.
#> ERROR: configuration failed for package ‘terra’
#> * removing ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3/terra’
#> ERROR: dependency ‘terra’ is not available for package ‘raster’
#> * removing ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3/raster’
#> ERROR: dependencies ‘raster’, ‘terra’ are not available for package ‘cdlTools’
#> * removing ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3/cdlTools’
#>
#> The downloaded source packages are in
#> ‘/tmp/RtmpyrNDzF/downloaded_packages’
#> ── R CMD build ─────────────────────────────────────────────────────────#> ──────────────────────────────────────────────────
#> ✔ checking for file ‘/tmp/RtmpyrNDzF/remotes24d8468ec6bf/getwilds-cancerprof-23dbd98/DESCRIPTION’ ...
#> ─ preparing ‘cancerprof’:
#> ✔ checking DESCRIPTION meta-information
#> ─ checking for LF line-endings in source and make files and shell scripts
#> ─ checking for empty or unneeded directories
#> ─ looking to see if a ‘data/datalist’ file should be added
#> ─ building ‘cancerprof_0.1.0.tar.gz’
#> Warning: invalid uid value replaced by that for user 'nobody'
#>
#> Installing package into ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3’
#> (as ‘lib’ is unspecified)
#> ERROR: dependency ‘cdlTools’ is not available for package ‘cancerprof’
#> * removing ‘/home/jro049/R/x86_64-pc-linux-gnu-library/4.3/cancerprof’
#> Warning messages:
#> 1: In i.p(...) : installation of package ‘terra’ had non-zero exit status
#> 2: In i.p(...) : installation of package ‘raster’ had non-zero exit status
#> 3: In i.p(...) :
#> installation of package ‘cdlTools’ had non-zero exit status
#> 4: In i.p(...) :
#> installation of package ‘/tmp/RtmpyrNDzF/file24d821d2b458/cancerprof_0.1.0.tar.gz’ had non-zero exit status
#>
#>
Created on 2024-04-30 with reprex v2.1.0
Do you get the same error if you install terra and raster independently?
install.packages(c("terra", "raster"))
Today I've tried on another computer (also Linux) and the pak
command worked - it actually showed me which system libraries were missing :thinking:
After installing those, I could re-run the pak
without problems. I will write some comments about this installation process and issues in my review, so that other users may be aware.
:calendar: @jromanowska you have 2 days left before the due date for your review (2024-05-20).
Hi @realbp! I'm sorry I haven't found a 2nd reviewer yet. I sent a few inquires out and didn't get responses back. I should have sent a 2nd batch out but that task got buried in my to-do list. Apologies for the delay! I've sent a few more requests out and hopefully we can get this review process (re)kicked off!
Hi, I'm sorry - I need another week to complete the review. May is horrible with lots of free days here in Norway, and a fantastic summer weather this year, leaving too little time for work! :sweat_smile:
@ropensci-review-bot assign @ginberg as reviewer
@ginberg added to the reviewers list. Review due date is 2024-06-12. Thanks @ginberg for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@ginberg: If you haven't done so, please fill this form for us to update our reviewers records.
Thanks for the patience! Great work with {cancerprof}! :raised_hands:
I'm done with reviewing, please check my comments below.
The package includes all the following forms of documentation:
[ ] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[ ] Installation instructions: for the development version of package and any non-standard dependencies in README
pak
is a great package, some people might not use it and try installing
cancerprof
via devtools
which might fail not giving useful information about
source of error - the authors could add why they recommend installing via pak
[ ] Vignette(s): demonstrating major functionality that runs successfully locally
no vignettes found
when
installed[x] Function Documentation: for all exported functions
[ ] Examples: (that run successfully locally) for all exported functions
not run
- it would be more useful if each example
showed actually that the function works; this would also make the documentation
better[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
devtools::check()
I get a warning:
Warning: program compiled against libxml 210 using older 209
- I'm not sure
where it comes fromtestthat
, but maybe the test-dput*R
files that are in the R
directory should be in the tests
?Estimated hours spent reviewing: 16
Overall, the package is really useful for batch retrieval of data. The functions are well documented and designed, tests are well written. I have some comments that I hope will improve the usability of both, the package and documentation.
Avoid long lines, e.g., lines 59-63 in demo-education.R
could be split
after each logical operator. This will also improve readability of the if
clauses.
goodpractice::gp()
to get more tipsRunning the styler
package showed that only 4 files could be improved -
great work! You don't need to, but you can try that on these files:
risk-smoking.R
and handle-smoking.R
. This is difficult to
maintain. One way to deal with that is to create an environment that is loaded
when the package is loaded. Then, you can just grab the entire objects from
this environment, without the need for the user to be bothered by these objects.
Check here for some
more explanation.Use the default values of a function to ease function calling
demo_language
requires
the user to write language = "language isolation"
even if it's the only
possible valuedemo_education
, instead of aborting when the user does not provide some
arguments, one can give a warning and assume default valuesRisk smoking
there is so many various combinations that it might be difficult to use that
function when programmatically fetching data - perhaps you could split this
into smaller, more focused functions?incidence_cancer
with cancer = ovary
is possible only with sex = "females"
), thus in such cases, if the user
provides a wrong argument the function might just return the correct table and
issue a warningThe area
and areatype
arguments are not checked anywhere, so when user
puts a wrong value, they get an uninformative error:
Error in `process_resp()`:
! Invalid input, please check documentation for valid arguments.
Run `rlang::last_trace()` to see where the error occurred.
No issues here. The performace depends here on the speed of internet connection of the user.
gdal-config not found
- I can't find any mention that any software like
gdal
is reqiured by cancerprof
pak
, as the instructions say, I get the error:pak::pak("getwilds/cancerprof")
Error:
! error in pak subprocess
Caused by error:
! Could not solve package dependencies:
* getwilds/cancerprof: ! pkgdepends resolution error for getwilds/cancerprof.
Caused by error:
! Bad GitHub credentials, make sure that your GitHub token is valid.
Caused by error in `stop(http_error(resp))`:
! Unauthorized (HTTP 401).
Type .Last.error to see the more details.
(I got this resolved afterwards)
The vignettes are basically a list of examples for each of the functions. These examples should be (and in most cases are) in the function documentation. Instead, a vignette should guide a user through a specific use case scenario. Could the authors create a set of scientific questions that the package helps to answer? Especially batch retrieval of data is important.
I could not find explanation of "FIPS"
In "Demo mobility" it says: "The function defaults to "all races"
,
"both sexes"
, "ages 1+"
" - what does it mean? That the data from the source
is available only for these categories?
Usually, when I see "defaults", I would assume that these are the values
assumed if there is no specific value for an argument. Here, it seems like
there can't be any other value.
There is a mistake in risk_women_health
documentation - wrong names of the
columns of returned data frame.
The authors could explain the different categories of arguments more, e.g.,
in risk_women_health
, the women_health
argument has three possible values
("pap smear in past 3 years, no hysterectomy, ages 21-65", "mammogram in past 2 years, ages 50-74", "mammogram in past 2 years, ages 40+") and the columns of the returned data frame are not informative
enough to understand what type of data one actually gets
Is there a typo in risk_alcohol
documentation, under alcohol
argument?
(and does this argument value need to be so long?)
help(package = "cancerprof")
gives
URL '/help/library/cancerprof/html/00Index.html' not found
)Yes - good choice of naming. Although the demo_
functions sounded at first
like "demonstration" to me, but it's logical to abbreviate "demography" as "demo". :)
@jromanowska Thank you so much for your hard work at reviewing cancerprof! I will get to work on implementing improvements based on your feedback.
:calendar: @ginberg you have 2 days left before the due date for your review (2024-06-12).
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/637#issuecomment-2129520979 time 16
Logged review for jromanowska (hours: 16)
Thank you @jromanowska for the thorough review!
Just a heads up that I'll be on leave from June 17-July 8 and may or may not be able to get to this issue during that time.
I will be on holidays entire July. I can probably read and comment here, but I won't be able to test anything.
Well done with creating this package! See below for my comments.
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 6
ℹ Testing cancerprof
✔ | F W S OK | Context
✖ | 3 1 | demo-crowding
────────────────────────────────────────────────────────────────────
Error (test-demo-crowding.R:8:3): Output data type is correct
<httr2_failed/rlang_error/error/condition>
Error: Could not resolve host: statecancerprofiles.cancer.gov
Backtrace:
▆
1. ├─cancerprof::demo_crowding(...) at test-demo-crowding.R:8:2
2. │ └─... %>% req_perform() at cancerprof/R/demo-crowding.R:57:2
3. └─httr2::req_perform(.)
4. └─base::tryCatch(...)
5. └─base (local) tryCatchList(expr, classes, parentenv, handlers)
6. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
7. └─value[[3L]](cond)
It would be good to fail gracefully with an informative message if a resource is not available (and not give a error). See here for some ideas. You probably need to fix this anyway when you want the package to be on CRAN.
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/637#issuecomment-2163610098 time 6
Logged review for ginberg (hours: 6)
Awesome, thanks @ginberg !
@realbp go ahead and start responding to the reviews. I'll try to jump in when I can, but will be a little slow for the next 3 weeks.
@realbp: please post your response with @ropensci-review-bot submit response <url to issue comment>
if you haven't done so already (this is an automatic reminder).
Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html
Submitting Author Name: Brian Park Submitting Author Github Handle: !--author1-->@realbp<!--end-author1-- Repository: https://github.com/getwilds/cancerprof Version submitted: 0.1.0 Submission type: Standard Editor: !--editor-->@ldecicco-USGS<!--end-editor-- Reviewers: @jromanowska, @ginberg
Archive: TBD Version accepted: TBD Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
cancerprof allows users to retrieve data from State Cancer Profiles for programmable analysis. cancerprof makes accessing the undocumented API from State Cancer Profiles intuitive and easy.
The target audience for cancerprof is anyone who wants to access data from state cancer profiles to conduct programmable analysis without having to navigate the complex nature of its GUI. Specifically, cancer researchers could use cancerprof to conduct reproducable analysis of cancer crossed references with a variety of topics found within the data from state cancer profiles.
Currently there are no other softwares or packages that extracts the publicly available data from State Cancer Profiles.
Cancerprof does not breach any data privacy laws and complies with the ethics policies of ropensci.
https://github.com/ropensci/software-review/issues/635
pkgcheck
items which your package is unable to pass.Cancerprof passes all pkgcheck items
Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [x] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct