zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

Explicit index for caretStack ensemble #237

Closed vkostyuk closed 6 years ago

vkostyuk commented 6 years ago

In the Brief Intro to caretEnsemble, it says

DO NOT use the trainControl object you used to fit the training models to fit the ensemble. The re-sampling indexes will be wrong

Why is this the case? I need a particular resampling scheme (specified using index and indexOut) due to the nature of the data, and I would like to use the same scheme for trainig the ensemble.

vkostyuk commented 6 years ago

After looking at what caretEnsemble:::makePredObsMatrix does, I think I can answer my own question. The dataset for ensemble fitting consists of the rlistbound test set predictions of the models. In particular, if the union of the test sets is not the whole dataset (as was the case in for resampling scheme), there will be fewer rows in the ensembling dataset than in the original dataset, so the same resampling scheme can't be used.