Improve Random Forest note

martinapugliese commented 7 years ago

improve the note, see comments here

martinapugliese commented 6 years ago

bagging reduces then variance keeping the bias constant [1]
bagging needed not to correlate trees (work on same data) [1]
only$$ \sqrt p $$ chosen at each node split, preventing trees from using the same features so they are decorrelated [1]
OOB same as CV leave one out

martinapugliese commented 6 years ago

[ ] on sparse datasets
[ ] params to tune
[ ] pruning of trees
[ ] number of features : The reason for "randomness" in the algorithm is in the addition of, at each split in the learning process, a random subset of the features (feature bagging) %TODO mmm, is this it? This way, if a few features are strong predictors of the response variable, these will be selected by many trees, causing those to the correlated. The literature recommends, for a set of $p$ features, to use $\sqrt{p}$ of them for classification and $p/3$ for regression %TODO cite and explain

martinapugliese / tales-science-data