Low GPU utilization during Random Forest predict

Environment (for bugs)

OS platform, distribution and version (e.g. Linux Ubuntu 16.04): Ubuntu 16.04.6 LTS
Installed from (source or binary): pip
Version: 0.4.0
Python version (optional): 3.6
CUDA/cuDNN version: 10.0
GPU model (optional): Nvidia T4
CPU model: Intel Xeon, 32 cores
RAM available: 200 GB

Description

I want to use the Random Forest Classifier for predictions on a large amount of data but the prediction phase takes oddly much time and shows very low GPU utilization. Here are the parameters I used for training:

model = h2o4gpu.RandomForestClassifier(
    n_estimators = 100, criterion = "gini",
    max_depth = 8, min_samples_split = 2, min_samples_leaf = 1,
    min_weight_fraction_leaf = 0, max_features = "auto",
    max_leaf_nodes = None, min_impurity_decrease = 0,
    min_impurity_split = None, bootstrap = True, oob_score = False,
    n_jobs = -1, random_state = None, verbose = 0, warm_start = False,
    class_weight = None, subsample = 1, colsample_bytree = 1,
    num_parallel_tree = 1, tree_method = "gpu_hist", n_gpus = -1,
    predictor = "gpu_predictor", backend = "h2o4gpu")

model.fit(x_train, y_train)

The training works pretty well. It's comparably fast and constantly uses around 80% of the GPU (measured with nvidia-smi).

y_pred = model.predict(x_test)

The prediction however only utilizes 4% of the GPU for a fraction of the time it requires to do one iteration (across 10 samples) while it mostly seems to use the CPU with being constantly at 100% for one core. For a class size of 2 it takes around 0.4 seconds, for 10 classes it is 3.4 seconds. Running it on solely CPU-based scikit-learn is faster with only 0.1 seconds.

Is this a general problem of tree-based predictions or am I doing something wrong?

Thanks a lot in advance!

h2oai / h2o4gpu

Low GPU utilization during Random Forest predict #843

Environment (for bugs)

Description