hcho3 / xgboost-fast-hist-perf-lab

Deeper look into performance of tree_method='hist' for multi-core CPUs
5 stars 2 forks source link

build_hist deals with too few data #11

Closed Laurae2 closed 5 years ago

Laurae2 commented 5 years ago

build_hist (https://github.com/hcho3/xgboost-fast-hist-perf-lab/blob/master/src/build_hist.cc) deals with way too few data it is very difficult to parallelize unless we get rid of the inner loop which takes most of the work and should be limited to maximum 2 threads: https://github.com/hcho3/xgboost-fast-hist-perf-lab/blob/master/src/build_hist.cc#L37-L42

A simple modification in main (https://github.com/hcho3/xgboost-fast-hist-perf-lab/blob/master/src/main.cc#L68-L70) allows to get 10x the performance on 36 threads (this won't solve it is still SLOWER than 2 threads). However, synchronization between build_hist and main becomes necessary...

@hcho3 Is a modification of main.cc a valid operation?

Laurae2 commented 5 years ago

After making a simple parallel on the outer loop in main.cc instead of the inner loop in build_hist.cc:

image

image

image

image

hcho3 commented 5 years ago

@Laurae2 Yes, you may change main.cc.

Laurae2 commented 5 years ago

optimized to 36 threads = 0.8 seconds recently, new issue incoming describing it.