CPU multicore and NUMA - updated results from old GBM-multicore repo

szilard / GBM-perf

Performance of various open source GBM implementations

MIT License

215 stars 28 forks source link

Open szilard opened 5 years ago

szilard commented 5 years ago

szilard commented 5 years ago

Results:

szilard commented 4 years ago

2020-09-09 UPDATE: xgboost/lightgbm has improved in multi-core scaling / NUMA slow-down has been mitigated:

compare vs:

e.g.:

NUMA issue:

old:

Screen Shot 2020-09-09 at 10 38 33 AM

new:

Screen Shot 2020-09-09 at 10 38 45 AM

multicore scaling:

old:

Screen Shot 2020-09-09 at 10 41 31 AM

new:

Screen Shot 2020-09-09 at 10 41 10 AM

szilard commented 4 years ago

Re-run with all tools (+h2o, ++catboost):

Same with results for 1,2 cores removed and then rescaled to better see what's going on for many cores:

Speedups for 2,4,8,16 physical cores (no HT and no NUMA):

Speedup from 1 to 16 cores is:

size	h2o	xgboost	lightgbm	catboost
0.1M	3	6.5	1.5	3.5
1M	8	6.5	4	6
10M	24	5	7.5	8

szilard commented 4 years ago

runtime/size:

szilard commented 4 years ago

AUC:

szilard commented 4 years ago

NUMA+HT effect (combined):

64 cores (2 sockets, each with 16 physical cores + 16 HT) vs 16 physical cores on 1 socket:

Below red line means on 64 cores it is slower than on 16 cores

size	h2o	xgboost	lightgbm	catboost
0.1M	-40%	-50%	-70%	15%
1M	-15 %	-2%	-60%	-20%
10M	25%	35%	-20%	10%