Closed alexpghayes closed 6 years ago
It might also be nice to recommend that standard errors are reported as .pred_std_error
so that all prediction columns consistently start with .pred
.
We should also provide recommendations to make sure that {outcome_name}
in .pred_{outcome_name}
and {class_level}
in .pred_{class_level}
are valid names in R.
Also need an argument that specifies when std_error
/ a measure of uncertainty is returned.
We should also provide a recommendation about whether uncertainty should be reported by default.
Sorry for the piecemeal nature of these comments.
Many older predict()
methods return residual scales from prediction functions. This information should just be contained in the model itself and used implicitly if necessary, but never explicitly accessed through predict
.
Also need an argument that specifies when
std_error
/ a measure of uncertainty is returned.
Hmm. If we give it its own "type" argument then someone would need to call predict
multiple times if they want this and the intervals. Maybe include it when someone asks for intervals?
We should also provide a recommendation about whether uncertainty should be reported by default.
I don't think that it should since it would imply to people that models have those measures (which they mostly do not).
Many older
predict()
methods return residual scales from prediction functions. This information should just be contained in the model itself and used implicitly if necessary, but never explicitly accessed throughpredict
.
Example?
Hmm. If we give it its own "type" argument then someone would need to call predict multiple times if they want this and the intervals. Maybe include it when someone asks for intervals?
I was thinking a separate argument entirely. Playing around with an initial implementation that looks like:
safe_predict.lm <- function(
object,
newdata,
type = c(
"response",
"conf_int",
"pred_int"
),
se_fit = TRUE,
level = 0.95,
...) {
at the minute.
Many older predict() methods return residual scales from prediction functions. This information should just be contained in the model itself and used implicitly if necessary, but never explicitly accessed through predict.
Suppose you wanted to manually construct prediction intervals for lm
:
fit <- lm(hp ~ ., mtcars)
pred_list <- predict(fit, se.fit = TRUE)
str(pred_list)
#> List of 4
#> $ fit : Named num [1:32] 149 141 80 126 183 ...
#> ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#> $ se.fit : num [1:32] 12 12.5 14.4 11.9 11.7 ...
#> $ df : int 21
#> $ residual.scale: num 26
# standard errors to use for confidence intervals
pred_list$se.fit
#> [1] 12.02061 12.51955 14.40414 11.91279 11.66151 14.57049 14.35787
#> [8] 15.11333 22.11393 16.29543 15.24373 13.78279 11.05496 11.65334
#> [15] 14.77545 14.27848 16.41118 14.25712 17.29307 15.32382 16.30941
#> [22] 11.62309 11.64295 15.38930 12.57646 10.01764 20.45875 18.03826
#> [29] 22.03205 15.97691 19.52634 14.98992
# standard errors to use in prediction intervals
with(pred_list, sqrt(se.fit^2 + residual.scale^2))
#> [1] 28.61830 28.83143 29.69834 28.57319 28.46934 29.77938 29.67593
#> [8] 30.04871 34.11068 30.66029 30.11451 29.40200 28.22631 28.46599
#> [15] 29.88020 29.63760 30.72197 29.62732 31.20196 30.15513 30.66772
#> [22] 28.45362 28.46174 30.18846 28.85619 27.83641 33.06165 31.62106
#> [29] 34.05765 30.49220 32.49293 29.98683
Created on 2018-09-05 by the reprex package (v0.2.0).
One case where you see people using residual.scale
is simulating from the posterior of a GAM (typically fit with mgcv
). The mgcv
predict method doesn't return residual.scale
in a prediction list when se.fit = TRUE
, but lots of the examples in MASS
that BDR recommended as canonical references do.
With multiple types, does it need to call some sub-prediction function multiple times?
I see what you mean by residual scale. We'd basically have a(nother) function that incorporates that object (like you did).
(oops. didn't mean to close)
Ah. Defining the full list of acceptable types as a character vector makes it easy to use match.arg
for input validation. For example:
f <- function(x = c("a", "b", "c"))
match.arg(x) # here "a" is the default value
f(x = "d") # input validation
#> Error in match.arg(x): 'arg' should be one of "a", "b", "c"
f(x = c("b", "c"))
#> Error in match.arg(x): 'arg' must be of length 1
Created on 2018-09-05 by the reprex package (v0.2.0).
So getting different types of predictions requires multiple calls to predict()
still (as I think it should).
For GLMs we can get confidence intervals at the linear predictor scale (stats::predict.glm()
gives standard errors at this scale for example), but we can also use the inverse link function to generate a confidence interval on the response scale. For example, as in https://stackoverflow.com/questions/14423325/confidence-intervals-for-predictions-from-logistic-regression.
It may make sense to have a separate interval
argument in addition to type
. Pretty sure this comes up a bunch in GAM land where you can only get uncertainty estimates on certain scales and then you spend a bunch of time transforming it to the scale you want.
It may make sense to have a separate
interval
argument in addition totype
. Pretty sure this comes up a bunch in GAM land where you can only get uncertainty estimates on certain scales and then you spend a bunch of time transforming it to the scale you want.
I think that 90% of the type, people want it on the response scale. Let's think about this for a while. I hate to add options that are only relavant to a handful of models. If a default can capture a large number of the use cases, then people could use type = "raw"
(to get direct access to the underlying predict function) for the others. This came up in topepo/parsnip#50 with ranger
's type = "oob"
.
I think that 90% of the type, people want it on the response scale.
True. That seems like a reasonable convention, and then let people doing fancy things dig around in the internals on their own.
Not quite following how type = "raw"
is supposed to work. What does direct access mean?
For type = "raw"
, we would pass the data and any options and return the underlying model object's predictions without any post-processing.
Here's an example:
> library(parsnip)
> library(rsample)
> data("attrition")
>
>
> obj <-
+ rand_forest(mode = "classification", others = list(probability = FALSE)) %>%
+ fit(Attrition ~ ., data = attrition, engine = "ranger")
>
> predict(obj, newdata = attrition[1:3,-2])
# A tibble: 3 x 1
.pred_class
<fct>
1 Yes
2 No
3 Yes
>
> raw_res <- predict(
+ obj,
+ newdata = attrition[1:3,-2],
+ type = "raw",
+ opts = list(predict.all = TRUE)
+ )
>
> # Get the raw object back
> raw_res
Ranger prediction
Type: Classification
Sample size: 3
Number of independent variables: 30
> raw_res$prediction[, 1:4]
[,1] [,2] [,3] [,4]
[1,] 1 1 2 2
[2,] 1 1 1 1
[3,] 1 2 2 1
I'm following the notes from the prediction section fairly closely as I write tests for
modeltests
.Recommendations I think we should add:
newdata
argument should default toNULL
predict
should do something graceful when categorical predictors innewdata
have novel levels not present in the training data. This should be carefully documented.NA
in these cases (although imputation, etc, are also options). Again I think we just want to remind people to document.type
default to a character vector of allowed types, so that the function signature looks likenewdata
. Mostly this is astats::poly()
andsplines::ns()
trap that people can get caught in:Created on 2018-09-04 by the reprex package (v0.2.0).
There's something similar for
stats::poly()
although the details escape me at the moment.Questions
This SO response seems to suggest that GPs for classification do this? I imagine this sort of reporting might be useful for something like a neural net with softmax probabilities where the softmax probabilities aren't really meaningful, so you might want to look at bootstrapped class probabilities to sanity check.
.pred_class
column be? The same factor levels as the factor in input dataset specified as the outcome? Should factor levels that never appear in the training data get dropped? Etc, etc