Closed vinkaga closed 7 years ago
This is most likely because of the RGF model that does not accept seeds. Most (if not all) of other models , have their results repeatable if you add a seed (which they probably have already in your params file) .
You could remove that model for stability. Alternatively you can use bagging . In every model you could add the parameters bags:a_value_higher_than_1
to perform bagging which means running the same model but with different seeds and average all the predictions for that model.
Bagging of 8 seems to work well.
I my 2 runs, for Zillow, on the same set of data resulted in public score difference of 0.0002. While this is not a huge difference and is likely expected in typical statistical processes, this difference is significant in this competition. More importantly, it makes step-by-step model improvement difficult due to noise.
Any suggestion on making runs more repeatable? I'am willing to accept larger computation time for better repeatability.