Open EmilHvitfeldt opened 3 months ago
Originally posted in https://stackoverflow.com/questions/78169514/r-tidymodels-step-novel-does-not-work-when-combined-in-a-workflow-with-resampl.
Basically, lm() only keeps track of the levels that appear in the training data set, regardless of the levels of the factor.
lm()
So when step_novel() creates the novel level new it is completely ignored by lm() so when new data comes with novel levels it complains.
step_novel()
As far as I can tell this is hardcoded behavior of lm() https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/library/stats/R/lm.R#L32
I don't know how much we can do about this, but we might be able to catch this earlier and throw a better error.
Smaller reprex below:
library(parsnip) library(dplyr) data("ames", package = "modeldata") ames_mini <- ames |> select(Sale_Price, Lot_Shape) ames_train <- ames_mini |> filter(Lot_Shape != "Regular") levels(ames_train$Lot_Shape) #> [1] "Regular" "Slightly_Irregular" "Moderately_Irregular" #> [4] "Irregular" ames_test <- ames_mini |> filter(Lot_Shape == "Regular") lm_spec <- linear_reg() lm_fit <- fit(lm_spec, Sale_Price ~ ., ames_train) lm_fit$fit$xlevels #> $Lot_Shape #> [1] "Slightly_Irregular" "Moderately_Irregular" "Irregular" lm_fit |> predict(ames_test) #> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): factor Lot_Shape has new level Regular
Originally posted in https://stackoverflow.com/questions/78169514/r-tidymodels-step-novel-does-not-work-when-combined-in-a-workflow-with-resampl.
Basically,
lm()
only keeps track of the levels that appear in the training data set, regardless of the levels of the factor.So when
step_novel()
creates the novel level new it is completely ignored bylm()
so when new data comes with novel levels it complains.As far as I can tell this is hardcoded behavior of
lm()
https://github.com/SurajGupta/r-source/blob/a28e609e72ed7c47f6ddfbb86c85279a0750f0b7/src/library/stats/R/lm.R#L32I don't know how much we can do about this, but we might be able to catch this earlier and throw a better error.
Smaller reprex below: