szilard / GBM-perf

Performance of various open source GBM implementations
MIT License
215 stars 28 forks source link

CPU Single threaded performance #22

Open szilard opened 5 years ago

szilard commented 5 years ago

This might be relevant for training lots of models (100s, 1000s...) on smaller data, when running them in parallel 1 model/CPU core would be probably the most efficient if the data is small and all the datasets (or if on same data, then multiple copies of the data) fit in RAM.

szilard commented 5 years ago

using c5.xlarge to have higher frequency CPU (4 cores, so 2 physical cores, leaving some resources to the EC2 hypervisor if needed; also 8GB RAM)

c5.18xlarge (72 cores, 144GB RAM) could run 36 such models in parallel on physical cores if data+train does not use more than 4GB/run); one could also test running 72 models in parallel (but measure the effect of hyperthreading on speed/thoughput) if data+train can be confined to 2GB/run.

szilard commented 5 years ago

https://github.com/szilard/GBM-perf/tree/master/wip-testing/single_thread

szilard commented 5 years ago
0.1m:
h2o 24.46 0.702228
xgboost 3.823 0.7324224
lightgbm 3.816 0.7298355
catboost 27.113 0.7225903
Rgbm 13.661 0.7190915
1m:
h2o 128.121 0.7623496
xgboost 26.692 0.7494959
lightgbm 20.393 0.7636987
catboost 273.306 0.7402029
Rgbm 233.34 0.7373496

RAM usage 1M: h2o 0.6GB xgb 1.0GB lgbm 1.0GB catboost 2.0GB Rgbm 0.8GB

szilard commented 5 years ago
  time [s] AUC time [s] AUC
  0.1m:   1m:  
h2o 24.5 0.702 128.1 0.762
xgboost 3.8 0.732 26.7 0.749
lightgbm 3.8 0.730 20.4 0.764
catboost 27.1 0.723 273.3 0.740
Rgbm 13.7 0.719 233.3 0.737
szilard commented 5 years ago

combined with previous results on c5.9xlarge (18 threads) https://github.com/szilard/GBM-perf/issues/13#issue-439317694 :

c5.xlarge 1 thread 0.1m: 1m: 0.1->1m
h2o 24.5 128.1 5.2
xgboost 3.8 26.7 7.0
lightgbm 3.8 20.4 5.3
catboost 27.1 273.3 10.1
Rgbm 13.7 233.3 17.1
c5.9xlarge 18 threads 0.1m: 1m: 0.1->1m
h2o 8.7 14.1 1.6
xgboost 3.2 10.8 3.3
lightgbm 2.0 4.3 2.2
catboost 4.6 33.9 7.4
1->18 threads 0.1m: 1m:  
h2o 2.8 9.1  
xgboost 1.2 2.5  
lightgbm 1.9 4.7  
catboost 5.9 8.1  
szilard commented 5 years ago

numbers inside the red area are time in seconds, outside are ratios:

Screen Shot 2019-05-11 at 7 08 49 AM
Laurae2 commented 5 years ago

Hardware/Software: https://github.com/szilard/GBM-perf/issues/12

hist xgboost, 1 model:

Size Time (1T) Time (9T) Time (18T) Time (36T) Time (70T)
0.1M 3.062 2.735 3.407 10.515 56.710
1M 23.293 12.524 12.929 25.980 96.465
10M 220.200 86.092 70.121 106.479 271.683
100M 2373.128 858.772 675.223 756.661 1142.271

LightGBM, 1 model:

Size Time (1T) Time (9T) Time (18T) Time (36T) Time (70T)
0.1M 2.983 1..801 2.389 3.266 5.943
1M 15.919 5.363 4.891 5.568 10.487
10M 180.748 53.689 48.260 47.033 53.234
100M 1930.816 578.734 560.296 507.580 507.627
szilard commented 5 years ago

Concurrent usage (training many models on the same hardware at the same time to see e.g. throughput etc.) will be studied in this repo by @Laurae2 (with some of my involvement) here: https://github.com/Laurae2/ml-perf/issues/3