Closed univ12 closed 7 years ago
You use the out-of-sample predictions for the training set to train the ensemble. E.g. if you did cross-validation, you can use the out-of-sample predictions from each cross-validation fold.
And if we repeat the bootstrapping 25 times, we get 25 different models. How are those averaged?
You use the out-of-sample predictions from each of the 25 models. You stack them up, and then train a single model.
See also "stacked generalization" here: http://mlwave.com/kaggle-ensembling-guide/
thanks for the link that was interesting. However, I still do not quite understand it. So,
and then?
or
You only train the linear model once; you don't re-train it.
now it's clear. So easy, thank you!
Hi, thanks for this nice package. From the documentation it is still unclear to me how this ensembling works and I'm not good in reading the source code. So could you please help me understand the principle. From my understanding it is like this:
Let X be the data and y the outcome. Since bootstrap resampling is the default option, we sample with replacement from X. We now have a training and test set, X[train] and X[test] We then train models A and B (e.g. linear model, random forest) on the training data and predict on the test data. But how do we proceed from here? Do we input these test data predictions in a linear model where we use y[test] as outcome? Do we now predict on the test data again? That would be strange since we used these data to build the linear model? And if we repeat the bootstrapping 25 times, we get 25 different models. How are those averaged?
Thanks