zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

averaging replicate predictions #132

Open topepo opened 9 years ago

topepo commented 9 years ago

For some of the resampling methods (e.g. bootstrap) the same sample could be held out multiple times. Right now, it looks like these results are treated as independent records when caretEnsemble finds its weights.

You might think about using the rowIndex column that comes along with the holdout predictions to average the delicate predictions. For factor predictions, this gets a little weird since you would end up with a vote proportion for each class. I'm not even sure that it would affect the greedy method but it might make a difference for lm or enet combination rules.

zachmayer commented 9 years ago

Hmmm, that's a good point. I'll look into a good way to address this for both classification and regression.

One thought is that by including different records different number of times, you are sort of getting a bootstrap sample to determine the weights.