Closed szilard closed 3 years ago
To be easier to reproduce my numbers and to get new ones in the future and or other hardware, I made a separate Dockerfile for this:
https://github.com/szilard/GBM-perf/tree/master/analysis/xgboost_cpu_by_version
You'll need to set the CPU core ids for the first socket, no hyper threaded cores (e.g. 0-15 on r4.16xlarge, which has 2 sockets, 16c+16HT each) and the xgboost version:
VER=v1.2.0
CORES_1SO_NOHT=0-15 ## set physical core ids on first socket, no hyperthreading
sudo docker build --build-arg CACHE_DATE=$(date +%Y-%m-%d) --build-arg VER=$VER -t gbmperf_xgboost_cpu_ver .
sudo docker run --rm -e CORES_1SO_NOHT=$CORES_1SO_NOHT gbmperf_xgboost_cpu_ver
It might be worth running the script several times, the training times on all cores usually show somewhat higher variability, not sure if because of the virtualization environment (EC2) or because of NUMA.
discussion continued here https://github.com/dmlc/xgboost/issues/3810#issuecomment-694715060
xgboost improved significantly in multicore scaling and NUMA
Runtimes by version on r4.16xlarge (2so, 16c+HT) on 1, 16 and 64 cores on 1M rows: