shokru / mlfactor.github.io

Website dedicated to a book on machine learning for factor investing
198 stars 95 forks source link

Ensemble models #66

Open immortal678 opened 3 years ago

immortal678 commented 3 years ago

In example 11.1.2, I am not able to understand the calculation of weights on the unconstrained ensemble, if it is based on the training MAEs of the models, then RF > Pen reg > NN > and so on but the weights are not in line with this. I understand it is related to the correlation within the techniques, but if you can please elaborate it, I will highly appreciate that!. Thanks in advance!

shokru commented 3 years ago

I copy-pasted the weights:

Pen_reg -0.584393293

Tree -0.074509616

RF 1.331785969

XGB -0.001696782

NN 0.328813723

So indeed RF is way above all others. The problem is that to give more weight to RF, because weights sum to one, the ensemble must "short" other models. And Pen_reg is the chosen short leg. I guess one important driver which is not shown in the example is the variance of errors. I suppose that the variance of Pen_reg errors are higher than those of NN, which is why they are penalized (!) in the ensemble weights. I leave it to you to confirm that! (hopefully)

immortal678 commented 3 years ago

Hi, If you can please give an outside reference of the optimized ensemble used in the book, I will highly appreciate that!

shokru commented 3 years ago

There is no outside reference. Ensembling (like ML in general) is a cooking recipe. In the book we test several recipes and in this chapter, they do not work too well probably because the original models are too correlated (hence the y_tilde tell the same story, thus learning from them from different perspectives is bound to fail).

Sorry for the disappointment...