[x] "Thus, in cases in which the class marginal
036 distributions seem inseparable, RF may be suboptimal." please clarify
[x] mention Rotation forest in par 2?
[x] "There is 074 therefore a gap that calls for the construction of a scalable 075 oblique decision forest classifier that can perform well in 076 the presence of an overwhelming number of irrelevant fea- 077 tures. " why doesn't Breiman's cover it?
[x] what's "to wit"?
[x] " All of 180 the oblique decision forest algorithms to date do not 181 impose a sparsity constraint on the candidate split di- 182 rections, which may cause such algorithms to under- 183 perform when the signal is compressive." except breiman, right?
[x] "It is apparent that RerF bares a resemblance to Forest-RC, 201
yet there is one key difference. In Forest-RC, the number 202 of nonzeros in each column of A is fixed to be the same 203 across A and for every split node, and its optimal value has 204 to be found through parameter selection. Our construction 205 circumvents selection of this parameter by randomizing the 206 number of nonzeros in each column of A. Furthermore, our 207 algorithm allows A to have columns of varying sparsity, 208 which may promote more diversity in the ensemble. To 209 our knowledge, this property does not exist in any of the 210 proposed random projection forest algorithms." also, our projections approximately preserve distance between points with high probability, whereas breiman's has no such guarantees.
[x] sparse parity is actually sparse noisy parity. not sure it has to be in the name, but should be in the text more explicitly.
[x] "For this simulation, p = 20,
262 p∗ = 3, and n = 1000." p, p*, and n need to be defined
[x] gotta change colors from 'rgb' to 'm', 'g', 'c', seem my LOL paper. i vote green is RerF
[x] "This is due to the fact
354 that generating random rotation matrices requires QR de-
355 compositions, which have a time complexity of O(p3)." maybe put that fact in the next section?
[x] "O(dn_k )" what is n_k?
[x] "To store the resulting RerF re-
380 quires O(Ln log n), because each node can be represented
381 by the indices of the coordinates that received a non-zero
382 element, and the expected number of such indices is O(1)." great! also talk about space requirements for RF.
[x] "The cost of initially sorting in RerF(r), 417 and of computing the means of the classes in RerF(d) are 418 negligible compared to the main cost above." you didn't introduce the (r) and (d) variants yet.
[x] "For example, rotation forests require running 421 a singular value decomposition at each node." why not explicitly state computational complexity of rotation forests?
[x] " Comparing panels B and G with A and F show
446 that RF and RerF are sensitive to rotations, while RotRF is
447 insensitive to rotations." i'd write that panels A-E show that RF and RotRF outperform RF in both untransformed and transformed representations for Sparse Parity. This makes sense because linear combinations of variables can have some information about the class, whereas the original features are all uninformative on their own. For the trunk simulations, panel G shows that RotRF is more robust to rotations (as it should be), but is fragile in the face of affine
Changes have been made according to your comments.
Regarding the random projection theory, Breiman's Forest-RC also preserves pairwise distances. Apparently the only requirement for doing so is that elements must be i.i.d. with mean 0 \cite{Li2006}.
About how the delta matrix is used: it's not appended to X. It's appended to A. The way I think about it is that it's just another set of projections to supplement the sparse projections in A.
I've also added p=1000 for RF and RerF in the last two panels of fig 2.
more to come.