Open szilard opened 5 years ago
run first for 10 trees (time, AUC):
0.1m res1: Double = 85.617916949 res2: Double = 0.7111801947131664 observed 2 partitions for train (auto)
1m res1: Double = 117.134682544 res2: Double = 0.7281780861344342 observed 12 partitions
10m res1: Double = 857.679993995 res2: Double = 0.7307362798392467 observed 32 partitions (r4.8xlarge 32 cores)
Lightgbm | Spark 10 trees | adjusted ratio | |
0.1m | 2.4 | 85 | 354 |
1m | 5.2 | 117 | 225 |
10m | 42 | 857 | 204 |
(ratio is adjusted, assuming 100 trees takes 10x the time of 10 trees)
100 trees:
0.1m res1: Double = 1020.154942769 res2: Double = 0.7212927860682342
1m res1: Double = 1383.33980618 res2: Double = 0.7484692273842806
10m res1: Double = 8390.772561576 res2: Double = 0.7553002016056503
time lgbm | time spark | ratio | AUC lgbm | AUC spark | |
---|---|---|---|---|---|
0.1m | 2.4 | 1020 | 425 | 0.730 | 0.721 |
1m | 5.2 | 1380 | 265 | 0.764 | 0.748 |
10m | 42 | 8390 | 200 | 0.774 | 0.755 |
old results (2015-2016) summary for comparison:
random forest:
max_depth = 20, n_trees = 500
size | time [s] | RAM [GB] | AUC | |
---|---|---|---|---|
h2o | 1M | 600 | 5 | 0.755 |
spark | 1M | 2000 | 130 | 0.714 (bug) |
xgboost | 1M | 170 | 2 | 0.753 |
GBM:
learn_rate = 0.1, max_depth = 6, n_trees = 300
size | time [s] | RAM [GB] | AUC | |
---|---|---|---|---|
h2o | 1M | 60 | 2 | 0.743 |
spark | 1M | 6000 | 30 | 0.738 |
xgboost | 1M | 45 | 1 | 0.745 |
various numbers of trees and RAM usage:
10M recs (2GB RDD)
1 tree (depth 10): 92.995553313 Double = 0.7124803198872883 112GB
10 trees: 828.706118911 0.7308451726164384 125GB
100 trees: res1: Double = 8068.992507948 res2: Double = 0.7552995954933053 235GB
1 tree depth 1: res1: Double = 70.316245739 res2: Double = 0.6346946512846476 110GB
100M | 10M | ||||||
---|---|---|---|---|---|---|---|
trees | depth | time [s] | AUC | RAM [GB] | time [s] | AUC | RAM [GB] |
1 | 1 | 1150 | 0.634 | 620 | 70 | 0.635 | 110 |
1 | 10 | 1350 | 0.712 | 620 | 90 | 0.712 | 112 |
10 | 10 | 7850 | 0.731 | 780 | 830 | 0.731 | 125 |
100 | 10 | crash OOM | >960 (OOM) | 8070 | 0.755 | 230 |
100M ran on: x1e.8xlarge (32 cores, 1 NUMA, 960GB RAM)
10M ran on: r4.8xlarge (32 cores, 1 NUMA, 240GB RAM)
Spark multicore scaling (and NUMA):
server | cores | partitions | time [s] | speedup vs 1c |
---|---|---|---|---|
r4.8xlarge | 32 | 32 | 830 | 7.9 |
r4.8xlarge | 16 | 16 | 810 | 8.1 |
r4.8xlarge | 8 | 8 | 1040 | 6.3 |
r4.8xlarge | 2 | 2(m) | 3520 | 1.9 |
r4.8xlarge | 1 | 1(m) | 6560 | 1.0 |
r4.16xlarge | 64 | 63(a) | 680 | 9.6 |
r4.8xlarge | 32 | 64(m) | 830 | 7.9 |
(Spark does not slow down on multi-socket/NUMA servers; it's already so slow that NUMA is not a factor)
https://github.com/szilard/GBM-perf/tree/master/wip-testing/spark