Closed rsangole closed 2 years ago
Thanks for the report! The problem here is that po("fixfactors")
will remove factor-levels that have not been seen during training. Apparently in this particular CV-split, the training set does not contain any missing values in the embarked
column. While the po("imputeoor")
does impute the missing values during prediction, introducing the .MISSING
level, the po("fixfactors")
removes them again since they were not present during training.
This is something that should be fixed by po("imputesample")
, but in your code, po("imputesample")
comes before po("fixfactors")
. Instead, it should come afterwards:
gunion(list(poind, po("imputehist"))) %>>%
po("featureunion") %>>%
po("imputeoor") %>>%
po("fixfactors") %>>% #!!
po("imputesample") %>>% #!!
po(learner) |>
as_learner() -> graph_learner
This makes this particular example run for me. Does it solve the issue for you?
Yep, absolutely that fixed it! T'was an oversight on my end, thanks for the correction and help!
Description
Hello,
I'm trying to replicate the Titanic example from this post. However, I'm getting an error that one of the columns
Embarked
has missing values, despite building thepo
as posted.Could use some guidance - am I going wrong somewhere? I've put down a reproducible example below.
Cheers!
Reproducible example
Output
Session Info