Notes from safe_predict development

alexpghayes commented 6 years ago

I'm following the notes from the prediction section fairly closely as I write tests for modeltests.

Recommendations I think we should add:

The newdata argument should default to NULL
predict should do something graceful when categorical predictors in newdata have novel levels not present in the training data. This should be carefully documented.
I would also add some comments about missing data. Currently the notes specify the number of rows needs to be the same -- I would add that a reasonable default would be to set predictions to NA in these cases (although imputation, etc, are also options). Again I think we just want to remind people to document.
Add a recommendation that type default to a character vector of allowed types, so that the function signature looks like

predict_method(object, newdata = NULL, type = c("class", "prob"), ...) {
  type <- match.arg(type)  # key!
}

Predictions should not depend on which observations are present in newdata. Mostly this is a stats::poly() and splines::ns() trap that people can get caught in:

fit <- lm(mpg ~ disp + hp + splines::ns(drat, 2), mtcars)

hp <- head(predict(fit, mtcars))
ph <- predict(fit, head(mtcars))

all.equal(hp, ph)
#> [1] "Mean relative difference: 0.1161239"

# loading the splines library fixes this

library(splines)
fit <- lm(mpg ~ disp + hp + ns(drat, 2), mtcars)

hp <- head(predict(fit, mtcars))
ph <- predict(fit, head(mtcars))

all.equal(hp, ph)
#> [1] TRUE

Created on 2018-09-04 by the reprex package (v0.2.0).

There's something similar for stats::poly() although the details escape me at the moment.

Questions

Say, for logistic regression, I have class A, with predicted prob 0.4 and class B with predicted prob 0.6 -- do people ever report uncertainty in these class probabilities?

This SO response seems to suggest that GPs for classification do this? I imagine this sort of reporting might be useful for something like a neural net with softmax probabilities where the softmax probabilities aren't really meaningful, so you might want to look at bootstrapped class probabilities to sanity check.

What should the factor levels of the .pred_class column be? The same factor levels as the factor in input dataset specified as the outcome? Should factor levels that never appear in the training data get dropped? Etc, etc

alexpghayes commented 6 years ago

It might also be nice to recommend that standard errors are reported as .pred_std_error so that all prediction columns consistently start with .pred.

alexpghayes commented 6 years ago

We should also provide recommendations to make sure that {outcome_name} in .pred_{outcome_name} and {class_level} in .pred_{class_level} are valid names in R.

alexpghayes commented 6 years ago

Also need an argument that specifies when std_error / a measure of uncertainty is returned.

alexpghayes commented 6 years ago

We should also provide a recommendation about whether uncertainty should be reported by default.

Sorry for the piecemeal nature of these comments.

alexpghayes commented 6 years ago

Many older predict() methods return residual scales from prediction functions. This information should just be contained in the model itself and used implicitly if necessary, but never explicitly accessed through predict.

topepo commented 6 years ago

Also need an argument that specifies when std_error / a measure of uncertainty is returned.

Hmm. If we give it its own "type" argument then someone would need to call predict multiple times if they want this and the intervals. Maybe include it when someone asks for intervals?

We should also provide a recommendation about whether uncertainty should be reported by default.

I don't think that it should since it would imply to people that models have those measures (which they mostly do not).

Many older predict() methods return residual scales from prediction functions. This information should just be contained in the model itself and used implicitly if necessary, but never explicitly accessed through predict.

Example?

alexpghayes commented 6 years ago

Hmm. If we give it its own "type" argument then someone would need to call predict multiple times if they want this and the intervals. Maybe include it when someone asks for intervals?

I was thinking a separate argument entirely. Playing around with an initial implementation that looks like:

safe_predict.lm <- function(
  object,
  newdata,
  type = c(
    "response",
    "conf_int",
    "pred_int"
  ),
  se_fit = TRUE,
  level = 0.95,
  ...) {

at the minute.

Many older predict() methods return residual scales from prediction functions. This information should just be contained in the model itself and used implicitly if necessary, but never explicitly accessed through predict.

Suppose you wanted to manually construct prediction intervals for lm:

fit <- lm(hp ~ ., mtcars)

pred_list <- predict(fit, se.fit = TRUE)
str(pred_list)
#> List of 4
#>  $ fit           : Named num [1:32] 149 141 80 126 183 ...
#>   ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#>  $ se.fit        : num [1:32] 12 12.5 14.4 11.9 11.7 ...
#>  $ df            : int 21
#>  $ residual.scale: num 26

# standard errors to use for confidence intervals
pred_list$se.fit 
#>  [1] 12.02061 12.51955 14.40414 11.91279 11.66151 14.57049 14.35787
#>  [8] 15.11333 22.11393 16.29543 15.24373 13.78279 11.05496 11.65334
#> [15] 14.77545 14.27848 16.41118 14.25712 17.29307 15.32382 16.30941
#> [22] 11.62309 11.64295 15.38930 12.57646 10.01764 20.45875 18.03826
#> [29] 22.03205 15.97691 19.52634 14.98992

# standard errors to use in prediction intervals
with(pred_list, sqrt(se.fit^2 + residual.scale^2))
#>  [1] 28.61830 28.83143 29.69834 28.57319 28.46934 29.77938 29.67593
#>  [8] 30.04871 34.11068 30.66029 30.11451 29.40200 28.22631 28.46599
#> [15] 29.88020 29.63760 30.72197 29.62732 31.20196 30.15513 30.66772
#> [22] 28.45362 28.46174 30.18846 28.85619 27.83641 33.06165 31.62106
#> [29] 34.05765 30.49220 32.49293 29.98683

Created on 2018-09-05 by the reprex package (v0.2.0).

One case where you see people using residual.scale is simulating from the posterior of a GAM (typically fit with mgcv). The mgcv predict method doesn't return residual.scale in a prediction list when se.fit = TRUE, but lots of the examples in MASS that BDR recommended as canonical references do.

topepo commented 6 years ago

With multiple types, does it need to call some sub-prediction function multiple times?

I see what you mean by residual scale. We'd basically have a(nother) function that incorporates that object (like you did).

topepo commented 6 years ago

(oops. didn't mean to close)

alexpghayes commented 6 years ago

Ah. Defining the full list of acceptable types as a character vector makes it easy to use match.arg for input validation. For example:

f <- function(x = c("a", "b", "c")) 
  match.arg(x)  # here "a" is the default value

f(x = "d") # input validation
#> Error in match.arg(x): 'arg' should be one of "a", "b", "c"
f(x = c("b", "c"))
#> Error in match.arg(x): 'arg' must be of length 1

Created on 2018-09-05 by the reprex package (v0.2.0).

So getting different types of predictions requires multiple calls to predict() still (as I think it should).

Unrelated question

For GLMs we can get confidence intervals at the linear predictor scale (stats::predict.glm() gives standard errors at this scale for example), but we can also use the inverse link function to generate a confidence interval on the response scale. For example, as in https://stackoverflow.com/questions/14423325/confidence-intervals-for-predictions-from-logistic-regression.

It may make sense to have a separate interval argument in addition to type. Pretty sure this comes up a bunch in GAM land where you can only get uncertainty estimates on certain scales and then you spend a bunch of time transforming it to the scale you want.

topepo commented 6 years ago

It may make sense to have a separate interval argument in addition to type. Pretty sure this comes up a bunch in GAM land where you can only get uncertainty estimates on certain scales and then you spend a bunch of time transforming it to the scale you want.

I think that 90% of the type, people want it on the response scale. Let's think about this for a while. I hate to add options that are only relavant to a handful of models. If a default can capture a large number of the use cases, then people could use type = "raw" (to get direct access to the underlying predict function) for the others. This came up in topepo/parsnip#50 with ranger's type = "oob".

alexpghayes commented 6 years ago

I think that 90% of the type, people want it on the response scale.

True. That seems like a reasonable convention, and then let people doing fancy things dig around in the internals on their own.

Not quite following how type = "raw" is supposed to work. What does direct access mean?

topepo commented 6 years ago

For type = "raw", we would pass the data and any options and return the underlying model object's predictions without any post-processing.

topepo commented 6 years ago

Here's an example:

> library(parsnip)
> library(rsample)
> data("attrition")
> 
> 
> obj <- 
+   rand_forest(mode = "classification", others = list(probability = FALSE)) %>% 
+   fit(Attrition ~ ., data = attrition, engine = "ranger") 
> 
> predict(obj, newdata = attrition[1:3,-2])
# A tibble: 3 x 1
  .pred_class
  <fct>      
1 Yes        
2 No         
3 Yes        
> 
> raw_res <- predict(
+   obj,
+   newdata = attrition[1:3,-2],
+   type = "raw",
+   opts = list(predict.all = TRUE)
+ )
> 
> # Get the raw object back
> raw_res
Ranger prediction

Type:                             Classification 
Sample size:                      3 
Number of independent variables:  30 
> raw_res$prediction[, 1:4]
     [,1] [,2] [,3] [,4]
[1,]    1    1    2    2
[2,]    1    1    1    1
[3,]    1    2    2    1

tidymodels / model-implementation-principles

Notes from safe_predict development #2

Questions

Unrelated question