A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Benchmark entry for Rborist 0-1.1. This version can be built from Github source, although it is not yet available on CRAN.
Current performance measurements using 4-core AMD, 8GB desktop:
10^5 rows: 75 seconds, no swapping.
10^6 rows: 805 seconds, with swapping.
10^7 rows and 32-core performance TBD.
Benchmark entry for Rborist 0-1.1. This version can be built from Github source, although it is not yet available on CRAN. Current performance measurements using 4-core AMD, 8GB desktop: 10^5 rows: 75 seconds, no swapping. 10^6 rows: 805 seconds, with swapping. 10^7 rows and 32-core performance TBD.