Closed mikemahoney218 closed 1 year ago
@Nowosad I think that would be in scope for waywiser. I'm not planning on implementing it in the near future (I need to focus on my dissertation :sweat_smile:, so all the techniques I'm actively adding are things I'm going to use myself), but I could definitely see the package growing in that direction over time.
:calendar: @jakub_nowosad you have 2 days left before the due date for your review (2023-02-06).
:calendar: @nowosad you have 2 days left before the due date for your review (2023-02-06).
@mikemahoney218https://github.com/mikemahoney218 thanks for taking into account my comments. I hope that they have been useful.
I think these comments reflect that a few functions (namely, the p-value functions) are maybe a bit out of scope for the package, but I'm interested in what you (and others) think.
My opinion is that if a functions is not going to be used by the user it is best to keep it hidden. You can always use ::: if needed (for some reason…). So, the p-value functions is called from other functions in the package but I do not think that it is meant to be used directly by the user.
The p-value functions are included as "model assessment" tools because I've seen modeling projects use p-values to ID areas of concern, with regards to autocorrelation: locations with more extreme p-values for local autocorrelation metrics were selected for further investigation, to see if model specifications could be improved. In that sense, p-values are included as an assessment metric for predictive modeling, and not so much for statistical testing purposes. As such, waywiser lets you return p-values without also returning test statistics themselves, as this approach doesn't really require looking at the underlying test statistic values; extreme p-values are areas of interest, no matter what their actual statistic is.
I see. But p-values are tied to the stats statistics so I am not sure it is a good idea to only report p-values
Given all this, I see two desirable ways to address this set of comments:
I would go for (2) and compute statistics and associated p-values from a main function. Then the user can decide on what to plot in a map.
@becarioprecario Thank you -- your comments have been extremely useful :smile:
My thinking (and experience) is that the p-value functions are called directly by users as a model diagnostic tool during the iteration process -- these p-values aren't being reported in a publication, but rather used to guide model development by highlighting hot-spots for model residuals (and hopefully helping to make a model misspecification clear, so it can be fixed before any publication).
There's not really a great way for functions using the yardstick infrastructure to return two different statistics (so here, the test statistic and p-value). The idiomatic way to do so is to use yardstick::metric_set()
to combine two functions (here, the test statistic and p-value functions), but that's something that's best left to the user, as metric sets can't be "expanded" to include additional metrics.
For example, if you want to calculate (for instance) global Moran's I with a p-value, plus an agreement coefficient, you can run metrics <- yardstick::metric_set(ww_global_moran_i, ww_global_moran_p_value, ww_agreement_coefficient)
and then use the metrics()
function with your data to get all three outputs. If waywiser provided a metric set (which is what functions like ww_global_moran()
did, but note those functions were never in the submitted version of the package) then you couldn't call yardstick::metric_set(ww_global_moran, ww_agreement_coefficient)
; you'd get an error.
That's why the "combined" functions were removed before this package was submitted; they don't work in a lot of places that users would expect them to be useful, and explaining the reason they work in a very different way than the rest of the package is pretty hard to communicate. Instead, all of the metrics provided by this package are pure yardstick metrics, without any of the weird edge cases. That means they're restricted to each returning a single type of statistic.
I've added documentation as described in (2) to these functions (see for instance https://github.com/mikemahoney218/waywiser/commit/333cf42cec3f6bddcd7f8b54150bc1f5dd8e365f#diff-45d2e91a37be2289564b4e1c987cbc8ac817ee874cc0ddcf19cfcdd8088c01feR6-R9 ). I could also add documentation about using yardstick::metric_set()
to calculate both the test statistic and p-value at once, though I personally think it'd be better to not mention that; instead, the current documentation encourages people to use the spdep functions directly if they're looking to use p-values for other purposes than what I've described. This documentation (on using metric_set()
) would probably look like the section on metric_set()
that's in the Getting Started vignette.
Alternatively, I could remove the p-value functions, if you think that having them at all without a combination function is harmful. But I don't think it makes sense to add combination functions; they introduce too many weird edge cases and don't idiomatically fit into yardstick.
Many thanks @becarioprecario and @Nowosad for your useful reviews, and @mikemahoney218 for taking into account all the comments and suggestions to improve the package.
I would like to ask @becarioprecario and @Nowosad if you are happy with the new version of the package and the package can be approved or you have additional comments.
@Paula-Moraga I am happy the current version of the package.
Hi @becarioprecario @Paula-Moraga , I just wanted to bump this thread: are there additional comments still to be resolved? Thank you!
@mikemahoney218https://github.com/mikemahoney218 sorry for missing this. I am still concerned about not reporting together the test statistic and the p-value. Really, from a statistical point of view there is no point in reporting only one of them when making inference. You can have tiny p-values with really meaningless effects. I agree that not shooting yourself in the foot is most likely on the users’ side but still…
In any case, these two values can be extracted with the functions in the package, which is fine, I think. If you consider that this discussion is helpful you can include it somewhere in the documentation and/or vignettes.
@becarioprecario Not a problem -- thank you for taking the time to review the package! It's highly appreciated. I'll add a new section to the vignettes tomorrow and follow up here with the commit.
@becarioprecario I added a summary of this discussion to the "residual autocorrelation" vignette: https://github.com/mikemahoney218/waywiser/commit/c0860a1b974b9b0e2b5d55b990657d8a7706fb28
Thank you again for your time reviewing this package, it's been a real help.
@mikemahoney218https://github.com/mikemahoney218 many thanks for adding that note. I think that I have no more comments about the package. Thanks for contributing your package to the community!!
Many thanks @mikemahoney218 @becarioprecario @Nowosad for your time and work to improve the package. I am very pleased to approve it!
@ropensci-review-bot approve waywiser
Approved! Thanks @mikemahoney218 for submitting and @becarioprecario, @jakub_nowosad, @nowosad for your reviews! :grin:
To-dos:
@ropensci-review-bot invite me to ropensci/<package-name>
which will re-send an invitation.@ropensci-review-bot finalize transfer of <package-name>
where <package-name>
is the repo/package name. This will give you admin access back.pkgdown
website and are ok relying only on rOpenSci central docs building and branding,
pkgdown
website with a redirecting pagehttps://docs.ropensci.org/package_name
URL
field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar, https://github.com/ropensci/foobar
codemetar::write_codemeta()
in the root of your package.install.packages("<package-name>", repos = "https://ropensci.r-universe.dev")
thanks to R-universe.Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"
-type contributors in the Authors@R
field (with their consent).
Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. They will get in touch about timing and can answer any questions.
We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.
Last but not least, you can volunteer as a reviewer via filling a short form.
@ropensci-review-bot invite me to ropensci/waywiser
Invitation sent!
@ropensci-review-bot finalize transfer of waywiser
Transfer completed.
The waywiser
team is now owner of the repository and the author has been invited to the team
Date accepted: 2023-02-27 Submitting Author Name: Mike Mahoney Submitting Author Github Handle: !--author1-->@mikemahoney218<!--end-author1-- Repository: https://github.com/mikemahoney218/waywiser Version submitted: Submission type: Stats Badge grade: silver Editor: !--editor-->@Paula-Moraga<!--end-editor-- Reviewers: @becarioprecario, @jakub_nowosad, @nowosad
Due date for @becarioprecario: 2023-02-04 Due date for @jakub_nowosad: 2023-02-06 Due date for @nowosad: 2023-02-06Archive: TBD Version accepted: TBD Language: en
Scope
Please indicate which of our statistical package categories this package falls under. (Please check one appropriate box below):
Statistical Packages
Pre-submission Inquiry
General Information
Anyone fitting models to spatial data, particularly (but not exclusively) people working within the tidymodels ecosystem. This includes a number of domains, and we've already been using it in our modeling practice.
Paste your responses to our General Standard G1.1 here, describing whether your software is:
Please include hyperlinked references to all other relevant software.
The waywiser R package makes it easier to measure the performance of models fit to 2D spatial data by implementing a number of well-established assessment methods in a consistent, ergonomic toolbox; features include new yardstick metrics for measuring agreement and spatial autocorrelation, functions to assess model predictions across multiple scales, and methods to calculate the area of applicability of a model.
Relevant software implementing similar algorithms include CAST for
ww_area_of_applicability()
. Several yardstick metrics implemented directly wrap spdep in a more consistent interface. Willmott's D is also implemented in hydroGOF. Other functions have (as far as I am aware) not been implemented elsewhere, such asww_multi_scale()
which implements the procedure from Riemann et al 2010, orww_agreement_coefficient()
which implements metrics from Ji and Gallo 2006.N/A
Badging
Silver
This is the primary aspect which I believe merits the silver status. The waywiser package implements routines which are useful for a wide variety of spatial models and integrates well with the tidymodels ecosystem, making it (hopefully!) of interdisciplinary interest.
Depending on what the editors think, I'd also potentially submit this for gold, based upon the following two aspects:
But I'm not familiar enough with the system to know if waywiser is likely to be in compliance with these two aspects, and am comfortable submitting for "silver" status if waywiser does not obviously meet both.
Technical checks
Confirm each of the following by checking the box.
autotest
checks on the package, and ensured no tests fail. (Sorry, both the release and CRAN versions of autotest fail immediately on my machine with internal errors -- that is, from autotest itself and not from my package -- and therefore I have not been able to use it).srr_stats_pre_submit()
function confirms this package may be submitted.pkgcheck()
function confirms this package may be submitted - alternatively, please explain reasons for any checks which your package is unable to pass.This package:
Publication options
Code of conduct