h2oai / h2o4gpu

H2Oai GPU Edition
Apache License 2.0
460 stars 95 forks source link

Optimize xgboost random forest for GPU #412

Open pseudotensor opened 6 years ago

pseudotensor commented 6 years ago

Use Nvidia MPS and tune kernel core count to per-kernel optimal (found don't need to use all cores before). Then MPS will allow non-serial (i.e. parallel) overlap of kernels and use cores more efficiently. This works if not memory bound or of lots of memory contention one wants to hide. I found before the GPU xgboost has lots of memory stalls (70% of time!) so should be able to do 3X kernels for 3X faster random forest (i.e. do 3 trees in parallel). But with kernel core count reduction, might be able to squeeze more and get 5X performance or so.

@RAMitchell

pseudotensor commented 6 years ago

Also means can overlap multiple XGBoost runs and get about 3X as many models completed for normal GBM. Useful for DAI, because typically 8 models and only (say) 2-3 GPUs, but then can run those in parallel instead of sequentially giving nice boost.