szilard / GBM-perf

Performance of various open source GBM implementations
MIT License
215 stars 28 forks source link

CPU multicore and NUMA - updated results from old GBM-multicore repo #29

Open szilard opened 5 years ago

szilard commented 5 years ago

Redoing old stuff from: https://github.com/szilard/GBM-multicore

New code here: https://github.com/szilard/GBM-perf/tree/master/analysis/multicore

szilard commented 5 years ago

Results:

https://htmlpreview.github.io/?https://github.com/szilard/GBM-perf/blob/master/analysis/multicore/results/res.html

szilard commented 4 years ago

2020-09-09 UPDATE: xgboost/lightgbm has improved in multi-core scaling / NUMA slow-down has been mitigated:

https://htmlpreview.github.io/?https://github.com/szilard/GBM-perf/blob/master/analysis/multicore/results-update2020sept/res-new.html

compare vs:

https://htmlpreview.github.io/?https://github.com/szilard/GBM-perf/blob/master/analysis/multicore/results-update2020sept/res-old.html

e.g.:

NUMA issue:

old:

Screen Shot 2020-09-09 at 10 38 33 AM

new:

Screen Shot 2020-09-09 at 10 38 45 AM

multicore scaling:

old:

Screen Shot 2020-09-09 at 10 41 31 AM

new:

Screen Shot 2020-09-09 at 10 41 10 AM

szilard commented 4 years ago

Re-run with all tools (+h2o, ++catboost):

https://htmlpreview.github.io/?https://github.com/szilard/GBM-perf/blob/master/analysis/multicore/results-update2020sept-2/res.html

Screen Shot 2020-09-13 at 2 39 08 AM

Same with results for 1,2 cores removed and then rescaled to better see what's going on for many cores:

Screen Shot 2020-09-13 at 2 39 22 AM

Speedups for 2,4,8,16 physical cores (no HT and no NUMA):

Screen Shot 2020-09-13 at 2 40 22 AM

Speedup from 1 to 16 cores is:

size h2o xgboost lightgbm catboost
0.1M 3 6.5 1.5 3.5
1M 8 6.5 4 6
10M 24 5 7.5 8
szilard commented 4 years ago

runtime/size:

Screen Shot 2020-09-13 at 4 11 46 AM
szilard commented 4 years ago

AUC:

Screen Shot 2020-09-13 at 4 12 51 AM
szilard commented 4 years ago

NUMA+HT effect (combined):

64 cores (2 sockets, each with 16 physical cores + 16 HT) vs 16 physical cores on 1 socket:

Screen Shot 2020-09-13 at 6 51 55 AM

Below red line means on 64 cores it is slower than on 16 cores

size h2o xgboost lightgbm catboost
0.1M -40% -50% -70% 15%
1M -15 % -2% -60% -20%
10M 25% 35% -20% 10%