szilard / GBM-perf

Performance of various open source GBM implementations
MIT License
215 stars 28 forks source link

Run on p4d.24xlarge with A100-SXM4-40GB GPU #48

Closed szilard closed 5 months ago

szilard commented 3 years ago

CUDA 11.2.0

h2o:

 [1] "water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for XGBoost model: XGBoost_model_R_1611054589962_1.  Details: ERRR on field: _backend: GPU backend (gpu_id: 0) is not functional. Check CUDA_PATH and/or GPU installation.\n"

XGBoost:

xgboost [11:11:59] WARNING: /xgboost/src/learner.cc:222: No visible GPU is found, setting `gpu_id` to -1
Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj) :
  [11:11:59] /xgboost/src/gbm/gbtree.cc:511: Check failed: common::AllVisibleGPUs() >= 1 (0 vs. 1) : No visible GPU is found for XGBoost.

Lightgbm:

lightgbm Error in lgb.last_error() : api error: No OpenCL device found
Error in initialize(...) : lgb.Booster: cannot create Booster handle
Calls: cat ... system.time -> lgb.train -> <Anonymous> -> initialize

catboost:

catboost Error in catboost.train(learn_pool = dx_train, test_pool = NULL, params = params) :
  catboost/cuda/cuda_lib/cuda_base.h:281: CUDA error 802: system not yet initialized
Calls: cat -> system.time -> catboost.train

nvidia-smi -i 0
Tue Jan 19 11:09:30 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB      On   | 00000000:10:1C.0 Off |                    0 |
| N/A   28C    P0    41W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
szilard commented 3 years ago

needs fabric-manager because of multi-GPU