imbs-hl / ranger

A Fast Implementation of Random Forests
http://imbs-hl.github.io/ranger/
776 stars 193 forks source link

[New Feature] gcForest - Deep Forest: Towards An Alternative to Deep Neural Networks #170

Closed coforfe closed 7 years ago

coforfe commented 7 years ago

Hello,

Some days ago it was presented an alternative to deep neural networks based on a new concept based on deep random forest. Based on the maturity of ranger, it would be very nice to have this new approach as an alternative within ranger

References: Deep Forest: Towards An Alternative to Deep Neural Networks [https://arxiv.org/abs/1702.08835]

Thanks! Carlos.

hexhead commented 7 years ago

Has anyone seen an implementation? No software mentioned to replicate results.

thierrygosselin commented 7 years ago

LightGBM is getting there

Laurae2 commented 7 years ago

@hexhead You can try reproducing them using my implementation: https://github.com/Laurae2/Laurae - The issue it has is the model is massive at the end, alleviated using external model storage (to keep RAM usage constant).

If you don't get better performance than regular strong single models, it would be strange because it is a stacking ensemble model.

hexhead commented 7 years ago

@Laurae2 looks very nice! I will check it out.

Laurae2 commented 7 years ago

You can test deep forest on MNIST here also: https://github.com/Laurae2/Laurae/blob/master/demo/DeepForest_mnist.R (you will need to download the train/test CSVs separately) - if you want to train on all data, you need to write "N" when it asks for subsampling (there is a typo in the script message.

Performance is average here, because CNNs are able to learn better from fewer observations than typical non-linear models due to NNs being a mix of linear and non-linear models. More data and stride=1 should help Deep Forest (but explodes CPU time).

In addition, the speed is faster for CNN because I use Intel MKL. Also, the CNN I used was overpushed in hyperparameters (good parameters for CNN vs bad parameters on Cascade Forest / Multi-Grained Scanning / Deep Forest for maximum speed). If someone could test on all training data and report here, it would be great for comparing:

image

Ah, and yes, the file size explodes with such amount of models in Deep Forest (500+ MB, but I'm sure it is well compressible).

Laurae2 commented 7 years ago

You may find insanity results below on model size. See https://github.com/Microsoft/LightGBM/issues/331#issuecomment-288394696 or the text below (identical), just for Cascade Forest:

Here on Adult dataset. I'm also getting the same issue you described for Adult dataset: better RF out of the box (86.57%) than their results (86.17%). I'm sure Deep Forest/Boosting can push it even further easily (87.58%+).


Cascade Forest improvements (reported: mean without standard deviation for the forests):

Layer F1 (RF1) F2 (RF2) F3 (CRTF1) F4 (CRTF2) Avg Forest Perf.
Layer 1 86.3972% 86.4238% 78.7237% 77.9620% 85.3142% =
Layer 2 86.4095% 86.3972% 86.4443% 81.9299% 86.5549% +
Layer 3 86.5262% 86.4955% 86.6286% 83.9117% 86.6900% Best
Layer 4 86.4976% 86.4750% 84.0305% 83.7930% 86.5856% -
Layer 5 86.3972% 86.4198% 83.7930% 86.5651% 86.5057% -
Layer 6 86.3645% 86.3849% 86.4566% 80.6769% 86.4996% -
Layer 7 86.3993% 86.4300% 80.6769% 86.3993% 86.4627% -
Layer 8 86.3993% 86.4218% 86.3440% 86.3542% 86.6163% -

Model size:

Model Size (bytes) Size (better unit)
8 Layers 2,108,466,480 bytes 1.96GB
3 Layers 734,047,020 bytes 700MB

yes, the 8 layer model size is 1.96GB, not a typo.


Training time:

Model Iterations Accuracy Time (s) Perf.
Official paper
Deep Forest
see official paper 86.17xx% ?s 6th
xgboost
Mode: Random Forest
Full: 2000 iterations 86.5733% 62.001s 5th
xgboost
Mode: Boosted Trees
Best: 134 iterations
Train: 184 iterations
87.5868% 5.737s 1st
Cascade Forest Best: 3 layers
Train: 8 layers
86.6900% 1601.794s 4th
Cascade Forest
Stack: Random Forest
Full: 2000 iterations 86.7023% 65.434s 3rd
Cascade Forest
Stack: Boosted Trees
Best: 99 iterations
Train: 149 iterations
87.2797% 5.983s 2nd

Boosting speed is not typo, it is really ~6 seconds, and it gives the best accuracy out of the box without even any parameter tuning.

mnwright commented 7 years ago

We probably won't add this to ranger. Please reopen if needed.