Open yPennylane opened 3 years ago
caret
doesn't do anything to the factor levels (I went back and verified), so there's not much that I can do but track down the change:
The issue is within partykit
. cforest()
parses the data by passing it to ctree()
, which passes it to extree_data()
. That function has this line, which redefines the factor levels to only include those that were observed in the existing data.
When new data are given to predict()
it errors out during this code block because of the difference.
The authors of that package are very sharp and probably had a good reason for this. I would suggest contacting them to ask for advice. The maintainer has been very diligent in the past about issues (much better than I am).
Consider the following adapted cforest model (from the package partykit):
This function shall now be used with Caret's train function. Without any factor variables or without cross-validation it works fine. The problems appear when using factors as predictors and repeatedcv, because in the folds not all the factors are present but still appear within the factor levels:
warnings()
1: predictions failed for Fold1.Rep1: mtry=1 Error in model.frame.default(object$predictf, data = newdata, na.action = na.pass, : factor class has new levels a, c, g, k, m, p, s, t
The aim is to identify the levels of each fold.rep and assign only those, which are present in the respective fold:
I tried to include the assignment of the right factor levels within the
cforest_partykit
function (# make consistent factor levels
), but it seems to have no effect.How could I implement this in the caret
train()
ortrainControl()
orcreateDataPartition()
function?