train and predict method are slow on quantileregressionforest

saattrupdan / doubt

Bringing back uncertainty to machine learning.

MIT License

50 stars 3 forks source link

train and predict method are slow on quantileregressionforest #27

Closed ThomasBourgeois closed 3 years ago

ThomasBourgeois commented 3 years ago

I'm using the predict method on around 35 000 samples with 100 estimators : it's been running for 5 minutes already and going, on 4 cpus.. The train method is extremly long too : on 300 000 samples with 100 estimators, it took around 2 hours at least. Far from scikit-learn

Thanks for the lib though ! But right now, hard to use for me right now due to this slowness.

saattrupdan commented 3 years ago

Hi Thomas! Yeah that sounds quite suboptimal. Just to troubleshoot the issue, have you changed the default hyperparameters of the QuantileRegressionForest? What happens if you try setting min_samples_leaf=100 or max_leaf_nodes=100, say?

ThomasBourgeois commented 3 years ago

I did not change the defaults. Actually the predict method took something like 3 hours to give a result. And in the end the quantiles were exactly the same .. meaning I could not make it work : upper and lower bound were the same.

saattrupdan commented 3 years ago

Yeah I think what happened there was the forest producing trees with a single element in their leaves, which means that the quantiles will all be trivial (=identical). If you try changing one of the two arguments I mentioned above, then hopefully it should work! I guess the solution here could be just to change the defaults.

ThomasBourgeois commented 3 years ago

Oh ok, I thought the quantiles were done by doing a distribution over the different predictions of the the different trees, not over the distributiion of the leafs