szilard / GBM-perf

Performance of various open source GBM implementations
MIT License
213 stars 28 forks source link

xgboost on old / low-end laptop vs Spark cluster #24

Open szilard opened 5 years ago

szilard commented 5 years ago

laptop:

CPU i5-5200U (4 cores, 2 HT) 8GB RAM Windows

R CRAN xgboost nthread=1 (to use 1 CPU core)

10M records 10 trees

runs 71sec, AUC 0.726 2nd run 73sec

szilard commented 5 years ago

for comparison c5.2xlarge 8 cores, 16GB RAM (c5.xlarge 8GB RAM not enough):

10M records task set -c 0 (to use 1 CPU core)

100 trees xgboost runs 237sec, AUC 0.755 lightgbm 203s, 0.774

10 trees xgboost 38s, 0.726 lightgbm 29s 0.743

Note xgboost c5.* (1 core) 38 sec vs old laptop 72 sec, 1.9x

szilard commented 5 years ago

Spark 10 trees, depth 10:

size system nodes cores partitions time [s] AUC RAM [GB] total RAM [GB]
10M local r4.8xl 32 32 830 0.731 125 240
10M Cluster_1 r4.8xl 32 64 1180 0.731 73 240
10M Cluster_10 r4.8xl 320 320 (m) 330 0.73   2400
100M local x1e.8xl 32   7850 0.731 780 960
100M Cluster_10 r4.8xl 320 585 1825 0.731 10*72 2400
szilard commented 5 years ago

10 trees, depth 10:

lib size hw cores time [s] AUC RAM [GB] total RAM [GB]
lightgbm 10M c5.2xl 1 (m) 29 0.743   16
xgboost 10M c5.2xl 1 (m) 38 0.726   16
xgboost 10M i5-5200U 1 (m) 71 0.726 6 8
lightgbm 10M r4.8xl 16 (m) 7 0.743 4 240
lightgbm 100M r4.8xl 16 (m) 60 0.743 13(d)+5 240

xgboost/lightgbm: 100 trees runtime is <10x 10 trees, RAM = 10 trees Spark: 100 trees runtime = 10x 10 trees, RAM > 10 trees