Closed Laurae2 closed 5 years ago
After the enhancements of #11, I am noticing the build_hist is heavily latency bounded when running in parallel.
Specifically, this line just eats all the CPU time in parallel: https://github.com/hcho3/xgboost-fast-hist-perf-lab/blob/master/src/build_hist.cc#L12
Multiple solutions from there:
According to VTune:
In general:
The memory latency issue on the data allocation:
With a loop instead of fill.
gcc:
icc:
Found a workaround, new issue incoming to describe the workaround. But I had to ignore the private BuildHist variable private (public to main thread) though.
After the enhancements of #11, I am noticing the build_hist is heavily latency bounded when running in parallel.
Specifically, this line just eats all the CPU time in parallel: https://github.com/hcho3/xgboost-fast-hist-perf-lab/blob/master/src/build_hist.cc#L12
Multiple solutions from there:
According to VTune:
In general:
The memory latency issue on the data allocation: