Closed Laurae2 closed 5 years ago
After making a simple parallel on the outer loop in main.cc instead of the inner loop in build_hist.cc:
@Laurae2 Yes, you may change main.cc.
optimized to 36 threads = 0.8 seconds recently, new issue incoming describing it.
build_hist (https://github.com/hcho3/xgboost-fast-hist-perf-lab/blob/master/src/build_hist.cc) deals with way too few data it is very difficult to parallelize unless we get rid of the inner loop which takes most of the work and should be limited to maximum 2 threads: https://github.com/hcho3/xgboost-fast-hist-perf-lab/blob/master/src/build_hist.cc#L37-L42
A simple modification in main (https://github.com/hcho3/xgboost-fast-hist-perf-lab/blob/master/src/main.cc#L68-L70) allows to get 10x the performance on 36 threads (this won't solve it is still SLOWER than 2 threads). However, synchronization between build_hist and main becomes necessary...
@hcho3 Is a modification of main.cc a valid operation?