CPU Single threaded performance

szilard / GBM-perf

Performance of various open source GBM implementations

MIT License

215 stars 28 forks source link

CPU Single threaded performance #22

Open szilard opened 5 years ago

szilard commented 5 years ago

This might be relevant for training lots of models (100s, 1000s...) on smaller data, when running them in parallel 1 model/CPU core would be probably the most efficient if the data is small and all the datasets (or if on same data, then multiple copies of the data) fit in RAM.

szilard commented 5 years ago

using c5.xlarge to have higher frequency CPU (4 cores, so 2 physical cores, leaving some resources to the EC2 hypervisor if needed; also 8GB RAM)

c5.18xlarge (72 cores, 144GB RAM) could run 36 such models in parallel on physical cores if data+train does not use more than 4GB/run); one could also test running 72 models in parallel (but measure the effect of hyperthreading on speed/thoughput) if data+train can be confined to 2GB/run.

szilard commented 5 years ago

https://github.com/szilard/GBM-perf/tree/master/wip-testing/single_thread

szilard commented 5 years ago

0.1m:
h2o 24.46 0.702228
xgboost 3.823 0.7324224
lightgbm 3.816 0.7298355
catboost 27.113 0.7225903
Rgbm 13.661 0.7190915
1m:
h2o 128.121 0.7623496
xgboost 26.692 0.7494959
lightgbm 20.393 0.7636987
catboost 273.306 0.7402029
Rgbm 233.34 0.7373496

RAM usage 1M: h2o 0.6GB xgb 1.0GB lgbm 1.0GB catboost 2.0GB Rgbm 0.8GB

szilard commented 5 years ago

	time [s]	AUC	time [s]	AUC
	0.1m:		1m:
h2o	24.5	0.702	128.1	0.762
xgboost	3.8	0.732	26.7	0.749
lightgbm	3.8	0.730	20.4	0.764
catboost	27.1	0.723	273.3	0.740
Rgbm	13.7	0.719	233.3	0.737

szilard commented 5 years ago

combined with previous results on c5.9xlarge (18 threads) https://github.com/szilard/GBM-perf/issues/13#issue-439317694 :

c5.xlarge 1 thread	0.1m:	1m:	0.1->1m
h2o	24.5	128.1	5.2
xgboost	3.8	26.7	7.0
lightgbm	3.8	20.4	5.3
catboost	27.1	273.3	10.1
Rgbm	13.7	233.3	17.1
c5.9xlarge 18 threads	0.1m:	1m:	0.1->1m
h2o	8.7	14.1	1.6
xgboost	3.2	10.8	3.3
lightgbm	2.0	4.3	2.2
catboost	4.6	33.9	7.4
1->18 threads	0.1m:	1m:
h2o	2.8	9.1
xgboost	1.2	2.5
lightgbm	1.9	4.7
catboost	5.9	8.1

szilard commented 5 years ago

numbers inside the red area are time in seconds, outside are ratios:

Laurae2 commented 5 years ago

Hardware/Software: https://github.com/szilard/GBM-perf/issues/12

hist xgboost, 1 model:

Size	Time (1T)	Time (9T)	Time (18T)	Time (36T)	Time (70T)
0.1M	3.062	2.735	3.407	10.515	56.710
1M	23.293	12.524	12.929	25.980	96.465
10M	220.200	86.092	70.121	106.479	271.683
100M	2373.128	858.772	675.223	756.661	1142.271

LightGBM, 1 model:

Size	Time (1T)	Time (9T)	Time (18T)	Time (36T)	Time (70T)
0.1M	2.983	1..801	2.389	3.266	5.943
1M	15.919	5.363	4.891	5.568	10.487
10M	180.748	53.689	48.260	47.033	53.234
100M	1930.816	578.734	560.296	507.580	507.627

szilard commented 5 years ago

Concurrent usage (training many models on the same hardware at the same time to see e.g. throughput etc.) will be studied in this repo by @Laurae2 (with some of my involvement) here: https://github.com/Laurae2/ml-perf/issues/3