[ ] number of features : The reason for "randomness" in the algorithm is in the addition of, at each split in the learning process, a random subset of the features (feature bagging) %TODO mmm, is this it?
This way, if a few features are strong predictors of the response variable, these will be selected by many trees, causing those to the correlated. The literature recommends, for a set of $p$ features, to use $\sqrt{p}$ of them for classification and $p/3$ for regression %TODO cite and explain
improve the note, see comments here