waywiser: Ergonomic Methods for Assessing Spatial Models

mikemahoney218 commented 1 year ago

Date accepted: 2023-02-27 Submitting Author Name: Mike Mahoney Submitting Author Github Handle: !--author1-->@mikemahoney218@Paula-Moraga<!--end-editor-- Reviewers: @becarioprecario, @jakub_nowosad, @nowosad

Due date for @becarioprecario: 2023-02-04 Due date for @jakub_nowosad: 2023-02-06 Due date for @nowosad: 2023-02-06

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Type: Package
Package: waywiser
Title: Ergonomic Methods for Assessing Spatial Models
Version: 0.2.0.9000
Authors@R: c(
    person("Michael", "Mahoney", , "mike.mahoney.218@gmail.com", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0003-2402-304X")),
    person("Lucas", "Johnson", , "lucas.k.johnson03@gmail.com", role = c("ctb"),
           comment = c(ORCID = "0000-0002-7953-0260")),
    person("RStudio", role = c("cph", "fnd"))
  )
Description: Assessing predictive models of spatial data can be challenging, 
    both because these models are typically built for extrapolating outside the
    original region represented by training data and due to potential spatially
    structured errors, with "hot spots" of higher than expected error
    clustered geographically due to spatial structure in the underlying
    data. Methods are provided for assessing models fit to spatial data, 
    including approaches for measuring the spatial structure of model errors,
    assessing model predictions at multiple spatial scales, and evaluating where 
    predictions can be made safely. Methods are particularly useful for models 
    fit using the 'tidymodels' framework. Methods include Moran's I
    ('Moran' (1950) <doi:10.2307/2332142>), Geary's C 
    ('Geary' (1954) <doi:10.2307/2986645>), Getis-Ord's G
    ('Ord' and 'Getis' (1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>),
    agreement coefficients from 'Ji' and Gallo (2006) 
    (<doi: 10.14358/PERS.72.7.823>), agreement metrics from 'Willmott' (1981)
    (<doi: 10.1080/02723646.1981.10642213>) and 'Willmott' 'et' 'al'. (2012)
    (<doi: 10.1002/joc.2419>), an implementation of the area of applicability 
    methodology from 'Meyer' and 'Pebesma' (2021) 
    (<doi:10.1111/2041-210X.13650>), and an implementation of
    multi-scale assessment as described in 'Riemann' 'et' 'al'. (2010)
    (<doi:10.1016/j.rse.2010.05.010>).
License: MIT + file LICENSE
URL: https://github.com/mikemahoney218/waywiser,
    https://mikemahoney218.github.io/waywiser/
BugReports: https://github.com/mikemahoney218/waywiser/issues
Depends: 
    R (>= 3.6)
Imports: 
    dplyr,
    fields,
    FNN,
    glue,
    hardhat,
    Matrix,
    purrr,
    rlang,
    rsample,
    sf (>= 1.0-0),
    spdep (>= 1.1-9),
    stats,
    tibble,
    tidyselect,
    yardstick
Suggests: 
    applicable,
    caret,
    CAST,
    covr,
    ggplot2,
    knitr,
    modeldata,
    recipes,
    rmarkdown,
    spatialsample,
    spelling,
    testthat (>= 3.0.0),
    tidymodels,
    tidyr,
    tigris,
    vip,
    whisker,
    withr
Config/testthat/edition: 3
Config/testthat/parallel: true
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE, roclets = c("namespace", "rd", "srr::srr_stats_roclet"))
RoxygenNote: 7.2.3
Language: en-US
VignetteBuilder: knitr

Scope

Please indicate which of our statistical package categories this package falls under. (Please check one appropriate box below):

Statistical Packages
- [ ] Bayesian and Monte Carlo Routines
- [ ] Dimensionality Reduction, Clustering, and Unsupervised Learning
- [ ] Machine Learning
- [ ] Regression and Supervised Learning
- [ ] Exploratory Data Analysis (EDA) and Summary Statistics
- [x] Spatial Analyses
- [ ] Time Series Analyses

Pre-submission Inquiry

[x] A pre-submission inquiry has been approved in issue#565

General Information

Who is the target audience and what are scientific applications of this package?

Anyone fitting models to spatial data, particularly (but not exclusively) people working within the tidymodels ecosystem. This includes a number of domains, and we've already been using it in our modeling practice.

Paste your responses to our General Standard G1.1 here, describing whether your software is:
- The first implementation of a novel algorithm; or
- The first implementation within R of an algorithm which has previously been implemented in other languages or contexts; or
- An improvement on other implementations of similar algorithms in R.
Please include hyperlinked references to all other relevant software.

The waywiser R package makes it easier to measure the performance of models fit to 2D spatial data by implementing a number of well-established assessment methods in a consistent, ergonomic toolbox; features include new yardstick metrics for measuring agreement and spatial autocorrelation, functions to assess model predictions across multiple scales, and methods to calculate the area of applicability of a model.

Relevant software implementing similar algorithms include CAST for ww_area_of_applicability(). Several yardstick metrics implemented directly wrap spdep in a more consistent interface. Willmott's D is also implemented in hydroGOF. Other functions have (as far as I am aware) not been implemented elsewhere, such as ww_multi_scale() which implements the procedure from Riemann et al 2010, or ww_agreement_coefficient() which implements metrics from Ji and Gallo 2006.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

N/A

Badging

What grade of badge are you aiming for? (bronze, silver, gold)

Silver

If aiming for silver or gold, describe which of the four aspects listed in the Guide for Authors chapter the package fulfils (at least one aspect for silver; three for gold)

Have a demonstrated generality of usage beyond one single envisioned use case. Software is frequently developed for one particular use case envisioned by the authors themselves. Generalising the utility of software so that it is readily applicable to other use cases, and satisfactorily documenting such generality of usage, represents another aspect which may be considered sufficient for software to attain a silver grade.

This is the primary aspect which I believe merits the silver status. The waywiser package implements routines which are useful for a wide variety of spatial models and integrates well with the tidymodels ecosystem, making it (hopefully!) of interdisciplinary interest.

Depending on what the editors think, I'd also potentially submit this for gold, based upon the following two aspects:

Compliance with a good number of standards beyond those identified as minimally necessary. This will require reviewers and authors to agree on identification of both a minimal subset of necessary standards, and a full set of potentially applicable standards. This aspect may be considered fulfilled if at least one quarter of the additional potentially applicable standards have been met, and should definitely be considered fulfilled if more than one half have been met.

Internal aspects of package structure and design. Many aspects of the internal structure and design of software are too variable to be effectively addressed by standards. Packages which are judged by reviewers to reflect notably excellent design choices, especially in the implementation of core statistical algorithms, may also be considered worthy of a silver grade.

But I'm not familiar enough with the system to know if waywiser is likely to be in compliance with these two aspects, and am comfortable submitting for "silver" status if waywiser does not obviously meet both.

Technical checks

Confirm each of the following by checking the box.

[x] I have read the rOpenSci packaging guide.
[x] I have read the author guide and I expect to maintain this package for at least 2 years or have another maintainer identified.
[x] I/we have read the Statistical Software Peer Review Guide for Authors.
[ ] I/we have run autotest checks on the package, and ensured no tests fail. (Sorry, both the release and CRAN versions of autotest fail immediately on my machine with internal errors -- that is, from autotest itself and not from my package -- and therefore I have not been able to use it).
[x] The srr_stats_pre_submit() function confirms this package may be submitted.
[x] The pkgcheck() function confirms this package may be submitted - alternatively, please explain reasons for any checks which your package is unable to pass.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

ropensci-review-bot commented 1 year ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 1 year ago

:rocket:

The following problem was found in your submission template:

'statsgrade' variable must be one of [bronze, silver, gold] Editors: Please ensure these problems with the submission template are rectified. Package checks have been started regardless.

:wave:

mikemahoney218 commented 1 year ago

Sorry! The instructions at the top of the issue told me to not change anything other than the repo URL and GitHub handles -- this might be something to update in the issue template:

https://github.com/ropensci/software-review/blob/6c9722f839c810e2e063b0aca4d609b5142e85d9/.github/ISSUE_TEMPLATE/F-submit-statistical-software-for-review.md?plain=1#L8

ropensci-review-bot commented 1 year ago

Note: The following R packages were unable to be installed/upgraded on our system: [tigris, spatialsample, spdep]; some checks may be unreliable.

ropensci-review-bot commented 1 year ago

Oops, something went wrong with our automatic package checks. Our developers have been notified and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.

ropensci-review-bot commented 1 year ago

Checks for waywiser (v0.2.0.9000)

git hash: b8816249

:heavy_check_mark: Package is already on CRAN.
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_multiplication_x: Package coverage failed
:heavy_multiplication_x: R CMD check process failed with message: 'Build process failed'.

Important: All failing checks above must be addressed prior to proceeding

Package License: MIT + file LICENSE

1. rOpenSci Statistical Standards (`srr` package)

This package is in the following category:

Spatial

:heavy_multiplication_x: Package can not be submitted because the following standards [v0.2.0] are missing from your code:

SP2.1 SP2.2 SP2.2a SP2.2b

Click to see the report of author-generated standards compliance of the package with links to associated lines of code, which can be generated locally by running the srr_report() function from within a local clone of the repository.

2. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:-------------|------:| |internal |waywiser | 81| |internal |base | 80| |internal |utils | 33| |internal |graphics | 2| |imports |stats | 42| |imports |yardstick | 23| |imports |rlang | 11| |imports |purrr | 7| |imports |spdep | 4| |imports |hardhat | 3| |imports |sf | 3| |imports |glue | 2| |imports |rsample | 2| |imports |tidyselect | 2| |imports |dplyr | 1| |imports |fields | 1| |imports |FNN | 1| |imports |Matrix | 1| |imports |tibble | 1| |suggests |applicable | NA| |suggests |caret | NA| |suggests |CAST | NA| |suggests |covr | NA| |suggests |ggplot2 | NA| |suggests |knitr | NA| |suggests |modeldata | NA| |suggests |recipes | NA| |suggests |rmarkdown | NA| |suggests |spatialsample | NA| |suggests |spelling | NA| |suggests |testthat | NA| |suggests |tidymodels | NA| |suggests |tidyr | NA| |suggests |tigris | NA| |suggests |vip | NA| |suggests |whisker | NA| |suggests |withr | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

waywiser

calc_ssd (4), check_for_missing (3), gmfr (3), calc_spdu (2), calc_spod (2), ww_area_of_applicability (2), calc_aoa (1), calc_d_bar (1), calc_di (1), calc_spds (1), check_di_columns_numeric (1), check_di_importance (1), check_di_testing (1), create_aoa (1), expand_grid (1), is_longlat (1), predict.ww_area_of_applicability (1), print.ww_area_of_applicability (1), spatial_yardstick_df (1), spatial_yardstick_vec (1), standardize_and_weight (1), tidy_importance (1), tidy_importance.data.frame (1), tidy_importance.default (1), tidy_importance.vi (1), ww_agreement_coefficient_impl (1), ww_agreement_coefficient_vec (1), ww_agreement_coefficient.data.frame (1), ww_area_of_applicability.data.frame (1), ww_area_of_applicability.default (1), ww_area_of_applicability.formula (1), ww_area_of_applicability.rset (1), ww_build_neighbors (1), ww_build_weights (1), ww_global_geary_c_impl (1), ww_global_geary_c_vec (1), ww_global_geary_c.data.frame (1), ww_global_geary_pvalue_impl (1), ww_global_geary_pvalue_vec (1), ww_global_geary_pvalue.data.frame (1), ww_global_moran_i_impl (1), ww_global_moran_i_vec (1), ww_global_moran_i.data.frame (1), ww_global_moran_pvalue_impl (1), ww_global_moran_pvalue_vec (1), ww_global_moran_pvalue.data.frame (1), ww_local_geary_c_impl (1), ww_local_geary_c_vec (1), ww_local_geary_c.data.frame (1), ww_local_geary_pvalue_impl (1), ww_local_geary_pvalue_vec (1), ww_local_geary_pvalue.data.frame (1), ww_local_getis_ord_g_impl (1), ww_local_getis_ord_g_pvalue_vec (1), ww_local_getis_ord_g_pvalue.data.frame (1), ww_local_getis_ord_g_vec (1), ww_local_getis_ord_g.data.frame (1), ww_local_getis_ord_pvalue_impl (1), ww_local_moran_i_impl (1), ww_local_moran_i_vec (1), ww_local_moran_i.data.frame (1), ww_local_moran_pvalue_impl (1), ww_local_moran_pvalue_vec (1), ww_local_moran_pvalue.data.frame (1), ww_make_point_neighbors (1), ww_make_polygon_neighbors (1), ww_multi_scale (1), ww_systematic_agreement_coefficient_impl (1), ww_systematic_agreement_coefficient_vec (1), ww_systematic_agreement_coefficient.data.frame (1), ww_systematic_mpd.data.frame (1)

base

c (10), call (7), data.frame (7), mean (7), list (6), class (4), sum (4), if (3), nrow (3), abs (2), all (2), identical (2), inherits (2), is.na (2), names (2), unlist (2), any (1), character (1), drop (1), get (1), integer (1), length (1), missing (1), ncol (1), paste0 (1), round (1), seq_len (1), setdiff (1), sign (1), sqrt (1), unique (1)

stats

resid (20), dt (10), na.fail (6), lm (2), predict (2), complete.cases (1), cor (1)

utils

data (33)

yardstick

new_numeric_metric (23)

rlang

caller_env (6), exec (2), expr (2), list2 (1)

purrr

map (4), chuck (1), map_dbl (1), map_lgl (1)

spdep

knearneigh (1), localC_perm (1), localG_perm (1), Szero (1)

hardhat

mold (2), default_formula_blueprint (1)

sf

st_bbox (1), st_geometry_type (1), st_intersects (1)

glue

glue (2)

graphics

grid (2)

rsample

analysis (1), assessment (1)

tidyselect

eval_select (2)

dplyr

summarise (1)

fields

rdist (1)

FNN

knn.dist (1)

Matrix

mean (1)

tibble

tibble (1)

3. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 15 files) and - 1 authors - 3 vignettes - no internal data file - 15 imported packages - 95 exported functions (median 3 lines of code) - 173 non-exported functions in R (median 11 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 15| 73.0| | |files_vignettes | 3| 92.4| | |files_tests | 39| 98.8| | |loc_R | 1602| 79.6| | |loc_vignettes | 345| 68.1| | |loc_tests | 6814| 98.8|TRUE | |num_vignettes | 3| 94.2| | |n_fns_r | 268| 93.1| | |n_fns_r_exported | 95| 95.0|TRUE | |n_fns_r_not_exported | 173| 91.9| | |n_fns_per_file_r | 9| 84.5| | |num_params_per_fn | 4| 54.6| | |loc_per_fn_r | 9| 24.3| | |loc_per_fn_r_exp | 3| 1.5|TRUE | |loc_per_fn_r_not_exp | 11| 35.4| | |rel_whitespace_R | 18| 79.7| | |rel_whitespace_vignettes | 22| 54.6| | |rel_whitespace_tests | 19| 98.9|TRUE | |doclines_per_fn_exp | 109| 92.6| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 99| 79.1| | ---

3a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

4. `goodpractice` and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/mikemahoney218/waywiser/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mikemahoney218/waywiser/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 3898338974|Lock Threads |success |b88162 | 199|2023-01-12 | | 3744601924|pages build and deployment |success |a23b40 | 50|2022-12-20 | | 3744561926|pkgdown |success |b88162 | 118|2022-12-20 | | 3744561927|R-CMD-check |success |b88162 | 116|2022-12-20 | | 3744561920|R-CMD-check-hard |success |b88162 | 112|2022-12-20 | | 3744561918|test-coverage |success |b88162 | 116|2022-12-20 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following error: 1. Error in proc$get_built_file() : Build process failed #### Test coverage with [covr](https://covr.r-lib.org/) ERROR: Test Coverage Failed #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) Error : Build failed, unknown error, standard output: [33m* checking for file ‘waywiser/DESCRIPTION’ ... OK * preparing ‘waywiser’: * checking DESCRIPTION meta-information ... OK * installing the package to build vignettes * creating vignettes ... ERROR --- re-building ‘multi-scale-assessment.Rmd’ using rmarkdown Quitting from lines 52-58 (multi-scale-assessment.Rmd) Error: processing vignette 'multi-scale-assessment.Rmd' failed with diagnostics: OGRCreateCoordinateTransformation(): transformation not available --- failed re-building ‘multi-scale-assessment.Rmd’ --- re-building ‘residual-autocorrelation.Rmd’ using rmarkdown Quitting from lines 78-94 (residual-autocorrelation.Rmd) Error: processing vignette 'residual-autocorrelation.Rmd' failed with diagnostics: OGRCreateCoordinateTransformation(): transformation not available --- failed re-building ‘residual-autocorrelation.Rmd’ --- re-building ‘waywiser.Rmd’ using rmarkdown --- finished re-building ‘waywiser.Rmd’ SUMMARY: processing the following files failed: ‘multi-scale-assessment.Rmd’ ‘residual-autocorrelation.Rmd’ Error: Vignette re-building failed. Execution halted double free or corruption (out) Aborted (core dumped) [39m #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 601 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 14 Lines should not be more than 80 characters. | 585 unexpected input | 2

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.1.3 | |pkgcheck |0.1.0.32 | |srr |0.0.1.186 |

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

annakrystalli commented 1 year ago

Hello @mikemahoney218 ,

The failures shown on the checks appear to be genuine and are pointing to some issues in the package. It appears the package fails rcmdcheck because of the stuff shown at the bottom of "goodpractice and other checks", with vignettes producing a core dump.

We encourage you to try and reproduce in a clean docker container by cloning the repo and running rcmdcheck::rcmdcheck() and see what you get.

With respect to messages from statistical checks by srr, we are investigating as we thing those are issues on our end.

annakrystalli commented 1 year ago

BTW @mikemahoney218, regarding testing locally in a docker container, this section of our devguide has some useful links to info (just in case).

mpadge commented 1 year ago

@mikemahoney218 @annakrystalli The srr section of the checks has now also been updated. Sorry for any inconvenience. @mikemahoney218 Please call @ropensci-review-bot check package to re-generate the checks once you've addressed both the srr and the failing rcmdcheck issues.

mikemahoney218 commented 1 year ago

Hi @annakrystalli & @mpadge -- can I ask if there's more information about your CI server available anywhere? I'm wondering what results you get from sf::sf_extSoftVersion() (and sessionInfo()), as this issue seems to be local to your CI setup.

I can't reproduce the issue on CRAN: (Currently at https://win-builder.r-project.org/CY6Z7In5rrks/, expect that link will break after 1/15 though)

On CI: https://github.com/mikemahoney218/waywiser/pull/15

Or locally on Docker:

I notice that your link suggests the R-Hub docker images; it has been my experience that R-Hub has not been able to install most spatial software for a few years now. I checked using the rocker images, via the command:

docker run --rm -ti -v "$(pwd)":/home/rstudio rocker/geospatial R

(The volume attaches my code folder as the home directory in order to check the package.)

So it seems like I'm not able to reproduce this issue across a variety of environments.

mpadge commented 1 year ago

@mikemahoney218 It's our own docker image used specifically for package checks. Current version gives this:

sf::sf_extSoftVersion()
#>           GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
#>       "3.8.0"          "3.0.4"        "6.3.1"         "true"         "true" 
#>           PROJ 
#>        "6.3.1"

^{Created on 2023-01-12 with reprex v2.0.2}

... but i can confirm that the issue is directly caused by sf, and not your package. Wrong linkage with compiled version of GEOS. I'll ping here once we've fixed that up and can run the check again. That might take a while, so in the meantime please ignore those fails and accept our aplogies. Thanks.

mikemahoney218 commented 1 year ago

Thanks! I think there's still likely going to be an issue from using PROJ 6 -- the vignettes assume you've got access to the PROJ CDN, which I believe was a PROJ 7/2020-release feature, so the resulting vignettes may be odd -- but it shouldn't segfault; glad to hear you've caught it.

mikemahoney218 commented 1 year ago

@ropensci-review-bot check package

ropensci-review-bot commented 1 year ago

Thanks, about to send the query.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 1 year ago

Note: The following R packages were unable to be installed/upgraded on our system: [tigris, spatialsample, spdep]; some checks may be unreliable.

mikemahoney218 commented 1 year ago

Hi @annakrystalli ! I'm not sure how long the checks should take, but we're a bit past two hours now. I believe I've fixed the srr issue, and it sounds like fixing the CI system may take a while, but the package works on non-rOpenSci systems.

mpadge commented 1 year ago

@mikemahoney218 The comment above was intended to imply that checks for your package would not work until the problem was rectified. As said,

I'll ping here once we've fixed that up

But given that you've already called the checks, i'll just get them to dump updated versions here when they're done. Please bear with us, as this could take a few days to get around to.

mikemahoney218 commented 1 year ago

Ah sorry, I had assumed the package checks would just fail again and I'd be able to get the bot to verify I'd finished the srr. Apologies!

ropensci-review-bot commented 1 year ago

Checks for waywiser (v0.2.0.9000)

git hash: 6c57cc85

:heavy_check_mark: Package is already on CRAN.
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 100%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Package License: MIT + file LICENSE

1. rOpenSci Statistical Standards (`srr` package)

This package is in the following category:

Spatial

:heavy_check_mark: All applicable standards [v0.2.0] have been documented in this package (74 complied with; 39 N/A standards)

Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.

2. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:-------------|------:| |internal |waywiser | 81| |internal |base | 80| |internal |utils | 33| |internal |graphics | 2| |imports |stats | 42| |imports |yardstick | 23| |imports |rlang | 11| |imports |purrr | 7| |imports |spdep | 4| |imports |hardhat | 3| |imports |sf | 3| |imports |glue | 2| |imports |rsample | 2| |imports |tidyselect | 2| |imports |dplyr | 1| |imports |fields | 1| |imports |FNN | 1| |imports |Matrix | 1| |imports |tibble | 1| |suggests |applicable | NA| |suggests |caret | NA| |suggests |CAST | NA| |suggests |covr | NA| |suggests |ggplot2 | NA| |suggests |knitr | NA| |suggests |modeldata | NA| |suggests |recipes | NA| |suggests |rmarkdown | NA| |suggests |spatialsample | NA| |suggests |spelling | NA| |suggests |testthat | NA| |suggests |tidymodels | NA| |suggests |tidyr | NA| |suggests |tigris | NA| |suggests |vip | NA| |suggests |whisker | NA| |suggests |withr | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

waywiser

calc_ssd (4), check_for_missing (3), gmfr (3), calc_spdu (2), calc_spod (2), ww_area_of_applicability (2), calc_aoa (1), calc_d_bar (1), calc_di (1), calc_spds (1), check_di_columns_numeric (1), check_di_importance (1), check_di_testing (1), create_aoa (1), expand_grid (1), is_longlat (1), predict.ww_area_of_applicability (1), print.ww_area_of_applicability (1), spatial_yardstick_df (1), spatial_yardstick_vec (1), standardize_and_weight (1), tidy_importance (1), tidy_importance.data.frame (1), tidy_importance.default (1), tidy_importance.vi (1), ww_agreement_coefficient_impl (1), ww_agreement_coefficient_vec (1), ww_agreement_coefficient.data.frame (1), ww_area_of_applicability.data.frame (1), ww_area_of_applicability.default (1), ww_area_of_applicability.formula (1), ww_area_of_applicability.rset (1), ww_build_neighbors (1), ww_build_weights (1), ww_global_geary_c_impl (1), ww_global_geary_c_vec (1), ww_global_geary_c.data.frame (1), ww_global_geary_pvalue_impl (1), ww_global_geary_pvalue_vec (1), ww_global_geary_pvalue.data.frame (1), ww_global_moran_i_impl (1), ww_global_moran_i_vec (1), ww_global_moran_i.data.frame (1), ww_global_moran_pvalue_impl (1), ww_global_moran_pvalue_vec (1), ww_global_moran_pvalue.data.frame (1), ww_local_geary_c_impl (1), ww_local_geary_c_vec (1), ww_local_geary_c.data.frame (1), ww_local_geary_pvalue_impl (1), ww_local_geary_pvalue_vec (1), ww_local_geary_pvalue.data.frame (1), ww_local_getis_ord_g_impl (1), ww_local_getis_ord_g_pvalue_vec (1), ww_local_getis_ord_g_pvalue.data.frame (1), ww_local_getis_ord_g_vec (1), ww_local_getis_ord_g.data.frame (1), ww_local_getis_ord_pvalue_impl (1), ww_local_moran_i_impl (1), ww_local_moran_i_vec (1), ww_local_moran_i.data.frame (1), ww_local_moran_pvalue_impl (1), ww_local_moran_pvalue_vec (1), ww_local_moran_pvalue.data.frame (1), ww_make_point_neighbors (1), ww_make_polygon_neighbors (1), ww_multi_scale (1), ww_systematic_agreement_coefficient_impl (1), ww_systematic_agreement_coefficient_vec (1), ww_systematic_agreement_coefficient.data.frame (1), ww_systematic_mpd.data.frame (1)

base

c (10), call (7), data.frame (7), mean (7), list (6), class (4), sum (4), if (3), nrow (3), abs (2), all (2), identical (2), inherits (2), is.na (2), names (2), unlist (2), any (1), character (1), drop (1), get (1), integer (1), length (1), missing (1), ncol (1), paste0 (1), round (1), seq_len (1), setdiff (1), sign (1), sqrt (1), unique (1)

stats

resid (20), dt (10), na.fail (6), lm (2), predict (2), complete.cases (1), cor (1)

utils

data (33)

yardstick

new_numeric_metric (23)

rlang

caller_env (6), exec (2), expr (2), list2 (1)

purrr

map (4), chuck (1), map_dbl (1), map_lgl (1)

spdep

knearneigh (1), localC_perm (1), localG_perm (1), Szero (1)

hardhat

mold (2), default_formula_blueprint (1)

sf

st_bbox (1), st_geometry_type (1), st_intersects (1)

glue

glue (2)

graphics

grid (2)

rsample

analysis (1), assessment (1)

tidyselect

eval_select (2)

dplyr

summarise (1)

fields

rdist (1)

FNN

knn.dist (1)

Matrix

mean (1)

tibble

tibble (1)

3. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 15 files) and - 1 authors - 3 vignettes - no internal data file - 15 imported packages - 95 exported functions (median 3 lines of code) - 173 non-exported functions in R (median 11 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 15| 73.0| | |files_vignettes | 3| 92.4| | |files_tests | 39| 98.8| | |loc_R | 1602| 79.6| | |loc_vignettes | 345| 68.1| | |loc_tests | 6814| 98.8|TRUE | |num_vignettes | 3| 94.2| | |n_fns_r | 268| 93.1| | |n_fns_r_exported | 95| 95.0|TRUE | |n_fns_r_not_exported | 173| 91.9| | |n_fns_per_file_r | 9| 84.5| | |num_params_per_fn | 4| 54.6| | |loc_per_fn_r | 9| 24.3| | |loc_per_fn_r_exp | 3| 1.5|TRUE | |loc_per_fn_r_not_exp | 11| 35.4| | |rel_whitespace_R | 18| 79.7| | |rel_whitespace_vignettes | 22| 54.6| | |rel_whitespace_tests | 19| 98.9|TRUE | |doclines_per_fn_exp | 109| 92.6| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 99| 79.1| | ---

3a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

4. `goodpractice` and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/mikemahoney218/waywiser/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mikemahoney218/waywiser/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 3898338974|Lock Threads |success |b88162 | 199|2023-01-12 | | 3903744451|pages build and deployment |success |d4d305 | 51|2023-01-12 | | 3903687426|pkgdown |success |6c57cc | 121|2023-01-12 | | 3903687434|R-CMD-check |success |6c57cc | 119|2023-01-12 | | 3903687436|R-CMD-check-hard |success |6c57cc | 115|2023-01-12 | | 3903687429|test-coverage |success |6c57cc | 119|2023-01-12 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following notes: 1. checking Rd cross-references ... NOTE Packages unavailable to check Rd xrefs: ‘raster’, ‘terra’ 2. checking data for non-ASCII characters ... NOTE Note: found 1 marked UTF-8 string R CMD check generated the following check_fail: 1. rcmdcheck_non_ascii_characters_in_data #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 100 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) Error : [4m[33m Build failed, standard output: [39m[24m[33m* checking for file ‘waywiser/DESCRIPTION’ ... OK * preparing ‘waywiser’: * checking DESCRIPTION meta-information ... OK * installing the package to build vignettes * creating vignettes ... OK * checking for LF line-endings in source and make files and shell scripts * checking for empty or unneeded directories * re-saving .R files as .rda [39m [4m[31mStandard error: [39m[24m[31mError in loadNamespace(x) : there is no package called ‘waywiser’ Execution halted [39m #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 603 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 14 Lines should not be more than 80 characters. | 587 unexpected input | 2

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.1.3 | |pkgcheck |0.1.1.3 | |srr |0.0.1.188 |

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

annakrystalli commented 1 year ago

@ropensci-review-bot assign @Paula-Moraga as editor

ropensci-review-bot commented 1 year ago

Assigned! @Paula-Moraga is now the editor

Paula-Moraga commented 1 year ago

Hi @mikemahoney218, I am pleased to be the editor of this package. I will start looking for reviewers.

Paula-Moraga commented 1 year ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 1 year ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/571_status.svg)](https://github.com/ropensci/software-review/issues/571)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

Paula-Moraga commented 1 year ago

@ropensci-review-bot assign @becarioprecario as reviewer

ropensci-review-bot commented 1 year ago

@becarioprecario added to the reviewers list. Review due date is 2023-02-04. Thanks @becarioprecario for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

Paula-Moraga commented 1 year ago

@ropensci-review-bot assign @jakub_nowosad as reviewer

ropensci-review-bot commented 1 year ago

@jakub_nowosad added to the reviewers list. Review due date is 2023-02-06. Thanks @jakub_nowosad for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 1 year ago

I could not find user @jakub_nowosad

Paula-Moraga commented 1 year ago

@ropensci-review-bot assign @nowosad as reviewer

ropensci-review-bot commented 1 year ago

@nowosad added to the reviewers list. Review due date is 2023-02-06. Thanks @nowosad for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

becarioprecario commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[X] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[X] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[X] Installation instructions: for the development version of package and any non-standard dependencies in README
[ ] Vignette(s): demonstrating major functionality that runs successfully locally
[X] Function Documentation: for all exported functions
[ ] Examples: (that run successfully locally) for all exported functions. NOTE FROM REVIEWER: Some of the exported functions seem to be internal functions (see my comment below).
[X] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[X] Installation: Installation succeeds as documented.
[X] Functionality: Any functional claims of the software been confirmed.
[X] Performance: Any performance claims of the software been confirmed.
[X] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 3

[X] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Thanks for contributing the package to the community! I hope that the following comments can help you "fine tune" some parts of the package:

Vignettes: I have not been able to find any vignette in the package. There is an example in the README.md file that could be moved to a vignette.
Manual pages: I think that the examples included in the package include cases studies for all main functions in the package. For example, file global_geary_c.Rd includes an example in which function ww_global_geary() is exemplified. All the other functions described in the manual page are internal functions called from ww_global_geary(). At least, this is my feeling after reading the Rd files.
Related to my previous comment, it may be good to keep some functions as internal and not exported in the NAMESPACE file. For example, functions to compute p-values and possibly others.
In one of the examples I get some code shown that, I guess, should not be shown:

ww_gl_> ## Don't show:
ww_gl_> if (rlang::is_installed("sfdep")) (if (getRversion() >= "3.4") withAutoprint else force)({ # examplesIf
ww_gl_+ ## End(Don't show)

Also, when running example(ww_global_geary, run.dontrun=TRUE) the example does not seem to run at all. Please, could the author check this and, possibly, all the other examples to make sure that \dontshow and \dontrun marks are properly used.
I do not think that it is a good practice to write examples conditional on some package being installed. In particular, the sfdep package. PLease, simply make it a dependence to make sure that the package is installed before your package and that examples can be run. Otherwise the examples cannot be run... this is why I was having the behaviour in my previous comment (which I left because it was something that I was not expecting at all).
I do not think that the package fullfils the Packaging guidelines as there are no vignettes in the package.
When I run, for example, ww_global_geary(guerry_modeled, crime_pers, predictions) it is not clear to me what to do with the output. It provides a tibble with the estimate and a p-value, as well as a geometry in both lines. What is the geometry for?
Also, is there a way of getting a summary of the test? I mean, in a similar way as standard functions in the spdep package for Geary's c test, etc.

mikemahoney218 commented 1 year ago

@becarioprecario , thank you very much for your thoughtful review!

If I may, can I ask which version of the package you were reviewing? My intention was to submit the current development version of the package, which has several changes from the version currently on CRAN which fix elements of your review; notably, the ww_global_geary() function you mentioned no longer exists, sfdep is no longer in any package files at all, and there are three vignettes.

I apologize for the confusion here; I know it's not best practice for a submission package to already be on CRAN, but this package has expanded significantly in scope from the first CRAN release to the point that I thought it was worth an rOpenSci submission.

cc: @Paula-Moraga @Nowosad -- apologies if I wasn't clear enough which version was submitted for review; it's meant to be the GitHub version.

becarioprecario commented 1 year ago

Yes, I did review the package on CRAN. I will go through the GitHub version later and see if some of my comments still apply.

El 25 ene 2023, a las 23:45, Michael Mahoney @.***> escribió:

@becarioprecariohttps://github.com/becarioprecario , thank you very much for your thoughtful review!

If I may, can I ask which version of the package you were reviewing? My intention was to submit the current development version of the packagehttps://github.com/mikemahoney218/waywiser, which has several changes from the version currently on CRAN which fix elements of your review; notably, the ww_global_geary() function you mentioned no longer exists, sfdep is no longer in any package files at all, and there are three vignetteshttps://mikemahoney218.github.io/waywiser/articles/.

I apologize for the confusion here; I know it's not best practice for a submission package to already be on CRAN, but this package has expanded significantly in scope from the first CRAN release to the point that I thought it was worth an rOpenSci submission.

cc: @annakrystallihttps://github.com/annakrystalli @Nowosadhttps://github.com/Nowosad -- apologies if I wasn't clear enough which version was submitted for review; it's meant to be the GitHub versionhttps://github.com/mikemahoney218/waywiser.

— Reply to this email directly, view it on GitHubhttps://github.com/ropensci/software-review/issues/571#issuecomment-1404323351, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABYD6WUBPY7DMIIO222A5L3WUGUH3ANCNFSM6AAAAAATYIYV7Q. You are receiving this because you were mentioned.Message ID: @.***>

Paula-Moraga commented 1 year ago

Many thanks @becarioprecario for your quick review. We really appreciate your time and effort. We apologize for the confusion on the version to be reviewed. We are working to better clarify the guidelines to make it clear that the review should be about the GitHub version and not the CRAN version of the package. Also apologies to @mikemahoney218 for the confusion and many thanks for your clarification about your submission.

becarioprecario commented 1 year ago

I have now reviewed package version 0.2.0.9000 from GitHub.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[X] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[X] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[X] Installation instructions: for the development version of package and any non-standard dependencies in README
[X] Vignette(s): demonstrating major functionality that runs successfully locally
[X] Function Documentation: for all exported functions
[X] Examples: (that run successfully locally) for all exported functions. NOTE FROM REVIEWER: Some of the exported functions seem to be internal functions (see my comment below).
[X] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[X] Installation: Installation succeeds as documented.
[X] Functionality: Any functional claims of the software been confirmed.
[X] Performance: Any performance claims of the software been confirmed.
[X] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[X] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 3

[X] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Thanks for contributing the package to the community! I hope that the following comments can help you "fine tune" some parts of the package:

Vignettes: The vignettes seem to describe major functionalities in the package. I wonder whether in the main vignette waywiser the author could include a summary of all the capabilities of the R package (e.g., a table with a list of the available statistics implemented: name, descrption and, maybe, reference).
Vignette residual-autocorrelation I would use title Local Moran rather than Morans I in the label of the legend as the local Moran is what appears in the plot.
Manual pages: I think that the examples included in the package include cases studies for all main functions in the package. For example, file global_geary_c.Rd includes an example in which function ww_global_geary() is exemplified. All the other functions described in the manual page are internal functions called from ww_global_geary(). At least, this is my feeling after reading the Rd files.
Related to my previous comment, it may be good to keep some functions as internal and not exported in the NAMESPACE file. For example, functions to compute p-values and possibly others.
When I run, for example, ww_global_geary(guerry_modeled, crime_pers, predictions) it is not clear to me what to do with the output. It provides a tibble with the estimate and a p-value, as well as a geometry in both lines. What is the geometry for?
Also, is there a way of getting a summary of the test? I mean, in a similar way as standard functions in the spdep package for Geary's c test, etc.

mikemahoney218 commented 1 year ago

Hi all!

I've started responding to review comments in a new branch. I'm intending to wait to merge this until all reviews are in, to make sure I'm not asking Jakub to review a moving target :smile:

That said, I wanted to highlight that a lot of the tests (235 in total) were broken by yesterday's dplyr 1.1.0 release; these are fixed on the reviewer_comments branch, but may be a bit distracting if you review the main branch with the newest dplyr version installed.

In my view, the only test fix that's worth noting is that the ww_local_* functions rely on using dplyr::summarise() to return multiple rows, which returns a warning in dplyr 1.1.0. I've opened an issue in yardstick to see if this can be fixed upstream or if I need to work around the issue in waywiser itself, and for the time being simply ignored the warning in tests so that CI is a bit more useful, but wanted to clarify that I'm not planning on ignoring the warning long-term.

Nowosad commented 1 year ago

@mikemahoney218 feel free to merge the changes -- I plan to start reviewing the package later this week.

mikemahoney218 commented 1 year ago

Sounds good! I've merged the reviewer_comments branch, so tests now are updated to (and expect) dplyr 1.1.0.

mikemahoney218 commented 1 year ago

@becarioprecario , thank you for a thoughtful review! I've addressed most of your comments in the GitHub version of the package. I'm not sure about the best way to address the set of comments talking about the autocorrelation metrics; more detail on that below.

The following comments have been addressed in the current GitHub version:

Vignettes: The vignettes seem to describe major functionalities in the package. I wonder whether in the main vignette waywiser the author could include a summary of all the capabilities of the R package (e.g., a table with a list of the available statistics implemented: name, descrption and, maybe, reference).

Added! Commit here, rendered version here. I focused on the statistical functions, and didn't include helper functions or data, as I think those are more usefully described on the package reference page.

Vignette residual-autocorrelation I would use title Local Moran rather than Morans I in the label of the legend as the local Moran is what appears in the plot

Fixed, thank you! Commit here, rendered version here.

Manual pages: I think that the examples included in the package include cases studies for all main functions in the package. For example, file global_geary_c.Rd includes an example in which function ww_global_geary() is exemplified.

I've added examples for all functions to their respective Rd pages. I might eventually need to wrap some of these in if (interactive()) or similar, as past versions of the package have had issues with CRAN due to examples taking too long, but I agree it makes sense for all the functions to be demonstrated.

When I run, for example, ww_global_geary(guerry_modeled, crime_pers, predictions) it is not clear to me what to do with the output. It provides a tibble with the estimate and a p-value, as well as a geometry in both lines. What is the geometry for?

The geometry is no longer returned; this was a mistake in earlier versions of the package.

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Thank you! I'll add you as a reviewer in the DESCRIPTION once the package is accepted :smile:

I'm not sure of the best way to address this set of comments, and would appreciate further input from folks:

NOTE FROM REVIEWER: Some of the exported functions seem to be internal functions (see my comment below)

All the other functions described in the manual page are internal functions called from ww_global_geary(). At least, this is my feeling after reading the Rd files.

Related to my previous comment, it may be good to keep some functions as internal and not exported in the NAMESPACE file. For example, functions to compute p-values and possibly others.

Also, is there a way of getting a summary of the test? I mean, in a similar way as standard functions in the spdep package for Geary's c test, etc.

I think these comments reflect that a few functions (namely, the p-value functions) are maybe a bit out of scope for the package, but I'm interested in what you (and others) think.

For context, waywiser is meant to be a straightforward extension of yardstick for spatial data. The yardstick README says "yardstick is a package to estimate how well models are working using tidy data principles"; waywiser is also supposed to be a package for estimating model performance.

The p-value functions are included as "model assessment" tools because I've seen modeling projects use p-values to ID areas of concern, with regards to autocorrelation: locations with more extreme p-values for local autocorrelation metrics were selected for further investigation, to see if model specifications could be improved. In that sense, p-values are included as an assessment metric for predictive modeling, and not so much for statistical testing purposes. As such, waywiser lets you return p-values without also returning test statistics themselves, as this approach doesn't really require looking at the underlying test statistic values; extreme p-values are areas of interest, no matter what their actual statistic is.

It would make sense, if you're using these statistics for inference, that you'd want more information than waywiser currently provides. But I think providing this is a little challenging given how yardstick views its metrics. In the yardstick approach -- which waywiser matches as closely as possible -- each function is meant to be atomic, returning a single metric:

library(yardstick)

rmse(Orange, age, circumference)
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 rmse    standard        915.
mae(Orange, age, circumference)
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 mae     standard        806.
rmse_vec(Orange$age, Orange$circumference)
#> [1] 915.4996

^{Created on 2023-01-31 with reprex v2.0.2}

This is why ww_global_geary() and similar functions were removed after the last CRAN release -- these "combination functions" which returned two types of metrics (the test statistic and p-value) weren't really natural fits for how yardstick works, and caused a lot of problems.

If users want to calculate multiple metrics in one call, they can use yardstick::metric_set():

library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.

mets <- metric_set(
  rmse,
  mae
)

mets(Orange, age, circumference)
#> # A tibble: 2 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 rmse    standard        915.
#> 2 mae     standard        806.

^{Created on 2023-01-31 with reprex v2.0.2}

But an issue here is that the functions provided to metric_set() can't be aware of other functions in use; they need to calculate their values independently. That means that calculating additional information would require additional simulation runs, which winds up taking a lot of time:

library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
library(waywiser)
guerry_model <- guerry
guerry_lm <- lm(Crm_prs ~ Litercy, guerry_model)
guerry_model$predictions <- predict(guerry_lm, guerry_model)

mets <- metric_set(
  ww_local_geary_c,
  ww_local_geary_pvalue
)

# Triggering the new warning in dplyr 1.1.0 before running timing code, 
# as the first time the warning fires adds quite a bit of time to execution
invisible(
  ww_local_geary_c(guerry_model, Crm_prs, predictions)
)

# Uses localC in order to save a bit of time
system.time(
  ww_local_geary_c(guerry_model, Crm_prs, predictions)
)
#>    user  system elapsed 
#>   0.171   0.004   0.176

# Uses localC_perm, which returns local C values as well as p-values
system.time(
  ww_local_geary_pvalue(guerry_model, Crm_prs, predictions)
)
#>    user  system elapsed 
#>   0.198   0.004   0.203

# The metric_set() function isn't aware that both metrics could 
# be calculated in one call to localC_perm, and as such this takes
# a roughly additive amount of time (as each function is called separately)
system.time(
  mets(guerry_model, Crm_prs, predictions)
)
#>    user  system elapsed 
#>   0.374   0.012   0.387

^{Created on 2023-01-31 with reprex v2.0.2}

This means that calculating more information via waywiser (and the atomic yardstick approach) would be much less efficient than just using spdep directly. Given that the focus of the package is assessing prediction accuracy and agreement, I don't know how many people would use functions to help with inference either.

Given all this, I see two desirable ways to address this set of comments:

Remove p-value functions from the package. This removes a use-case for the package (looking for extreme p-values to identify areas which might help improve model specifications), but also makes it more clear that the package is designed for assessing predictive accuracy, and is not oriented towards inference.
Retain p-value functions, but add documentation clarifying that for inference users are recommended to use spdep equivalents directly.

A third method would be to add additional functions for calculating additional information useful for inference. I dislike this option; in these situations, users should use spdep directly.

I'd highly appreciate hearing what others think is the best approach here. Left to my own devices, I'd most likely keep the p-value functions with additional documentation, but am open to other directions.

Nowosad commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
☒ As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

☒ A statement of need: clearly stating problems the software is designed to solve and its target audience in README
☒ Installation instructions: for the development version of package and any non-standard dependencies in README
☒ Vignette(s): demonstrating major functionality that runs successfully locally
☒ Function Documentation: for all exported functions
☒ Examples: (that run successfully locally) for all exported functions
☒ Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

☒ Installation: Installation succeeds as documented.
☒ Functionality: Any functional claims of the software been confirmed.
☒ Performance: Any performance claims of the software been confirmed.
☒ Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
☒ Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing:

3

☒ Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer (“rev” role) in the package DESCRIPTION file.

Review Comments

In general, I find the package very well written and documented. I only have a few comments/suggestions:

waywiser.Rmd: Have you considered explaining the example data first in this vignette?
waywiser.Rmd: Could you add a sentence or two better explaining where n came from in the Multi-scale model assessment section?; why is this a list (and also why cellsize is not a list in multi-scale-assessment.Rmd)?
waywiser.Rmd: Area of Applicability: could you expand the description of the importance argument in this vignette?
waywiser.Rmd: Area of Applicability: I would suggest also showing the result here (it is hard to think about an area of applicability without seeing it first)
waywiser.Rmd: I like the Feature Matrix table, however, I am not sure if it should be at the end of this vignette. Have you considered moving it to a standalone vignette (for better visibility)? Also, could you replace DOI codes with DOI urls?
residual-autocorrelation.Rmd and multi-scale-assessment.Rmd: What is the reason for using the %>% pipe here? Why not use the native pipe (|>)?
residual-autocorrelation.Rmd: “This makes it easy to see what areas are poorly represented by our model” – could you elaborate on this sentence and explain which areas you are talking about?
GitHub Actions are mostly broken at the moment (I assume it is due to the recent dplyr changes)
Regarding your “p-value” question: I think option 2 is fine.
Question: How do you see the future of this package? Do you plan to add any new features (e.g., spatial explainers)?

ropensci-review-bot commented 1 year ago

:calendar: @becarioprecario you have 2 days left before the due date for your review (2023-02-04).

Paula-Moraga commented 1 year ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/571#issuecomment-1406660259 time 3

ropensci-review-bot commented 1 year ago

Couldn't find entry for becarioprecario in the reviews log

Paula-Moraga commented 1 year ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/571#issuecomment-1413602448 time 3

ropensci-review-bot commented 1 year ago

Couldn't find entry for Nowosad in the reviews log

mikemahoney218 commented 1 year ago

Thank you for your comments, @Nowosad ! I believe I've addressed all your comments in the current development version of the package. I've responded to your points with a bit more detail below.

waywiser.Rmd: Have you considered explaining the example data first in this vignette?

Added! (Commit, rendered)

waywiser.Rmd: Could you add a sentence or two better explaining where n came from in the Multi-scale model assessment section?; why is this a list (and also why cellsize is not a list in multi-scale-assessment.Rmd)?

Added! (Commit, rendered)

waywiser.Rmd: Area of Applicability: could you expand the description of the importance argument in this vignette?

Added! (Commit, rendered)

waywiser.Rmd: Area of Applicability: I would suggest also showing the result here (it is hard to think about an area of applicability without seeing it first)

Added! (Commit, rendered)

waywiser.Rmd: I like the Feature Matrix table, however, I am not sure if it should be at the end of this vignette. Have you considered moving it to a standalone vignette (for better visibility)? Also, could you replace DOI codes with DOI urls?

I moved it to be a standalone article (so on the pkgdown site, but not built on CRAN), which also let me move kableExtra out of Suggests. (Commit, rendered)

residual-autocorrelation.Rmd and multi-scale-assessment.Rmd: What is the reason for using the %>% pipe here? Why not use the native pipe (|>)?

The main reason is because this package supports R >= 4, while the pipe was added in 4.1. Using the native pipe would make CI runs for R 4.0 fail, or only be run conditionally; in order for the vignettes to build on every supported version of R, I've kept the %>% pipe for now.

residual-autocorrelation.Rmd: “This makes it easy to see what areas are poorly represented by our model” – could you elaborate on this sentence and explain which areas you are talking about?

I added a long discussion about what this means at the top of the vignette, and also elaborated a tiny bit at the end. (Commit, rendered)

GitHub Actions are mostly broken at the moment (I assume it is due to the recent dplyr changes)

These should be "fixed" for now, by tripping the new lifecycle warning in dplyr::summarise() at the top of each test file. The long-term fix will depend on if yardstick changes to use reframe().

Regarding your “p-value” question: I think option 2 is fine.

Thank you! I've gone ahead and added this documentation throughout. (Example of commit, example of rendered)

Question: How do you see the future of this package? Do you plan to add any new features (e.g., spatial explainers)?

I'm not sure I know what spatial explainers are! That said, this is my broad vision for the package:

In the near term, I think I might extend ww_multi_scale() to accept raster inputs, for situations where you've predicted a large raster that won't fit entirely in memory as points. This is pretty much the only feature I have planned.
I'm open to adding yardstick metrics (such as ww_agreement_coefficient, ww_local_geary_c) if any are requested, or if I run into any in the literature. That said, I don't have any plans here, and don't know of any that would be useful to add. If the metric isn't coming from the spatial literature, it should probably live in yardstick instead.
My general goal is for waywiser to be a useful toolbox for assessing spatial models, and I view anything that falls under that headline as being "in scope". If an assessment method is coming from the spatial modeling world, then it's a good candidate for waywiser, even if it's not inherently spatial (so AOA, agreement coefficient, Willmott's D etc all fall under this). If it's not coming from the spatial modeling world, then I'd probably rather contribute techniques to vip, DALEX, applicable, or yardstick.

With all that said, I'm not actively looking for things to add -- I'm currently only adding things that are useful for my own work. But if there are requests or PRs for other features, that's my basic outline for whether something belongs in waywiser or not.

Hope that answers the question!

☒ Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer (“rev” role) in the package DESCRIPTION file.

Thanks! I'll add you as a reviewer in the DESCRIPTION once the package is accepted :smile:

maelle commented 1 year ago

@Paula-Moraga I recorded the reviews information, sorry about the glitch.

Nowosad commented 1 year ago

@mikemahoney218 thanks for all of the improvements made.

I'm not sure I know what spatial explainers are!

I've been thinking of something like https://geods.netlify.app/post/spatial-ml-model-diagnostics/.

ropensci / software-review