hist
method of XGBoost scales poorly on multi-core CPUs: a demo scriptCurrently, the hist
tree-growing algorithm (tree_method=hist
) of XGBoost
scales poorly on multi-core CPUs: for some datasets, performance deteriorates as the number of threads is increased.
This issue was discovered by @Laurae2's
Gradient Boosting Benchmark.
To make things easier for contributors, I went ahead and isolated the performance bottleneck. A vast majority of time (> 95 %) is spent in a stage known as gradient histogram construction. This repository isolates this stage so that it is easy to fix and improve.
Compile the script by running CMake:
mkdir build
cd build
cmake ..
make
Download record.tar.bz2 in the same directory.
Extract record.tar.bz2 by running tar xvf record.tar.bz2
.
Run the script:
# Usage: ./perflab record/ [number of threads]
./perflab record/ 36
Running with different number of threads should produce the following trend of performance:
The script reads from record.tar.bz2, which was processed from the Bosch dataset. Its job is to compute histograms for gradient pairs, where each bin of histogram is a partial sum.
Some background:
(X_i, y_i)
is a pair of double
values that quantify the distance between the true label y_i
and predicted label yhat_i
.By default, 'Release' build type will be used, with flags -O3 -DNDEBUG
.
For perfiling, you may want to add debug symbols by choosing 'RelWithDebInfo' build type instead:
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
This build type uses the following flags: -O2 -g -DNDEBUG
.
For full control over the compilation flags, specify CMAKE_CXX_FLAGS_RELEASE
:
cmake -DCMAKE_CXX_FLAGS_RELEASE="-O3 -g -DNDEBUG -march=native" ..
This give you full control over the optimization flags. Here, we are compiling with -O3 -g -DNDEBUG -march=native
flags.
You can check whether they are applied using make VERBOSE=1
and looking at the C++ compilation lines for the existence of the flags you used:
/usr/bin/c++ -I/home/ubuntu/xgboost-fast-hist-perf-lab/include -O3 -g -DNDEBUG -march=native
-fopenmp -std=gnu++11 -o CMakeFiles/perflab.dir/src/main.cc.o
-c /home/ubuntu/xgboost-fast-hist-perf-lab/src/main.cc