Open szilard opened 5 years ago
using c5.xlarge to have higher frequency CPU (4 cores, so 2 physical cores, leaving some resources to the EC2 hypervisor if needed; also 8GB RAM)
c5.18xlarge (72 cores, 144GB RAM) could run 36 such models in parallel on physical cores if data+train does not use more than 4GB/run); one could also test running 72 models in parallel (but measure the effect of hyperthreading on speed/thoughput) if data+train can be confined to 2GB/run.
0.1m:
h2o 24.46 0.702228
xgboost 3.823 0.7324224
lightgbm 3.816 0.7298355
catboost 27.113 0.7225903
Rgbm 13.661 0.7190915
1m:
h2o 128.121 0.7623496
xgboost 26.692 0.7494959
lightgbm 20.393 0.7636987
catboost 273.306 0.7402029
Rgbm 233.34 0.7373496
RAM usage 1M: h2o 0.6GB xgb 1.0GB lgbm 1.0GB catboost 2.0GB Rgbm 0.8GB
time [s] | AUC | time [s] | AUC | |
---|---|---|---|---|
0.1m: | 1m: | |||
h2o | 24.5 | 0.702 | 128.1 | 0.762 |
xgboost | 3.8 | 0.732 | 26.7 | 0.749 |
lightgbm | 3.8 | 0.730 | 20.4 | 0.764 |
catboost | 27.1 | 0.723 | 273.3 | 0.740 |
Rgbm | 13.7 | 0.719 | 233.3 | 0.737 |
combined with previous results on c5.9xlarge (18 threads) https://github.com/szilard/GBM-perf/issues/13#issue-439317694 :
c5.xlarge 1 thread | 0.1m: | 1m: | 0.1->1m |
---|---|---|---|
h2o | 24.5 | 128.1 | 5.2 |
xgboost | 3.8 | 26.7 | 7.0 |
lightgbm | 3.8 | 20.4 | 5.3 |
catboost | 27.1 | 273.3 | 10.1 |
Rgbm | 13.7 | 233.3 | 17.1 |
c5.9xlarge 18 threads | 0.1m: | 1m: | 0.1->1m |
h2o | 8.7 | 14.1 | 1.6 |
xgboost | 3.2 | 10.8 | 3.3 |
lightgbm | 2.0 | 4.3 | 2.2 |
catboost | 4.6 | 33.9 | 7.4 |
1->18 threads | 0.1m: | 1m: | |
h2o | 2.8 | 9.1 | |
xgboost | 1.2 | 2.5 | |
lightgbm | 1.9 | 4.7 | |
catboost | 5.9 | 8.1 |
numbers inside the red area are time
in seconds, outside are ratios:
Hardware/Software: https://github.com/szilard/GBM-perf/issues/12
hist xgboost, 1 model:
Size | Time (1T) | Time (9T) | Time (18T) | Time (36T) | Time (70T) |
---|---|---|---|---|---|
0.1M | 3.062 | 2.735 | 3.407 | 10.515 | 56.710 |
1M | 23.293 | 12.524 | 12.929 | 25.980 | 96.465 |
10M | 220.200 | 86.092 | 70.121 | 106.479 | 271.683 |
100M | 2373.128 | 858.772 | 675.223 | 756.661 | 1142.271 |
LightGBM, 1 model:
Size | Time (1T) | Time (9T) | Time (18T) | Time (36T) | Time (70T) |
---|---|---|---|---|---|
0.1M | 2.983 | 1..801 | 2.389 | 3.266 | 5.943 |
1M | 15.919 | 5.363 | 4.891 | 5.568 | 10.487 |
10M | 180.748 | 53.689 | 48.260 | 47.033 | 53.234 |
100M | 1930.816 | 578.734 | 560.296 | 507.580 | 507.627 |
Concurrent usage (training many models on the same hardware at the same time to see e.g. throughput etc.) will be studied in this repo by @Laurae2 (with some of my involvement) here: https://github.com/Laurae2/ml-perf/issues/3
This might be relevant for training lots of models (100s, 1000s...) on smaller data, when running them in parallel 1 model/CPU core would be probably the most efficient if the data is small and all the datasets (or if on same data, then multiple copies of the data) fit in RAM.