Open szilard opened 5 years ago
2020-09-09 UPDATE: xgboost/lightgbm has improved in multi-core scaling / NUMA slow-down has been mitigated:
compare vs:
e.g.:
NUMA issue:
old:
new:
multicore scaling:
old:
new:
Re-run with all tools (+h2o, ++catboost):
Same with results for 1,2 cores removed and then rescaled to better see what's going on for many cores:
Speedups for 2,4,8,16 physical cores (no HT and no NUMA):
Speedup from 1 to 16 cores is:
size | h2o | xgboost | lightgbm | catboost |
---|---|---|---|---|
0.1M | 3 | 6.5 | 1.5 | 3.5 |
1M | 8 | 6.5 | 4 | 6 |
10M | 24 | 5 | 7.5 | 8 |
runtime/size:
AUC:
NUMA+HT effect (combined):
64 cores (2 sockets, each with 16 physical cores + 16 HT) vs 16 physical cores on 1 socket:
Below red line means on 64 cores it is slower than on 16 cores
size | h2o | xgboost | lightgbm | catboost |
---|---|---|---|---|
0.1M | -40% | -50% | -70% | 15% |
1M | -15 % | -2% | -60% | -20% |
10M | 25% | 35% | -20% | 10% |
Redoing old stuff from: https://github.com/szilard/GBM-multicore
New code here: https://github.com/szilard/GBM-perf/tree/master/analysis/multicore