Closed guolinke closed 3 years ago
Not sure, maybe you can ask your users to re-run my code (I have a Dockerfile), run it on other GPUs or look at the code and suggest improvements.
One thing I noticed is that I was using OHE in the benchmark for lightgbm. I added code using the categorical_feature = ...
thing and that speeds up a little bit on GPU (for large data):
but on CPU OHE and cat.enc. runs at the same speed:
Code and details here: https://github.com/szilard/GBM-perf/issues/38
My setup is:
CPU:
RUN git clone --recursive https://github.com/microsoft/LightGBM && \
cd LightGBM && Rscript build_r.R
GPU:
RUN apt-get install -y libboost-dev libboost-system-dev libboost-filesystem-dev ocl-icd-opencl-dev opencl-headers clinfo
RUN mkdir -p /etc/OpenCL/vendors && \
echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd ## otherwise lightgm segfaults at runtime (compiles fine without it)
RUN git clone --recursive https://github.com/microsoft/LightGBM && \
cd LightGBM && sed -i "s/use_gpu <- FALSE/use_gpu <- TRUE/" R-package/src/install.libs.R && Rscript build_r.R
on EC2 (CPU: r4.8xlarge // GPU: p3.8xlarge with Tesla V100) with Ubuntu 20.04, CUDA 11.0, R 4.0
and code:
suppressMessages({
library(data.table)
library(ROCR)
library(lightgbm)
library(Matrix)
})
set.seed(123)
d_train <- fread("train.csv", showProgress=FALSE)
d_test <- fread("test.csv", showProgress=FALSE)
d_all <- rbind(d_train, d_test)
d_all$dep_delayed_15min <- ifelse(d_all$dep_delayed_15min=="Y",1,0)
d_all_wrules <- lgb.convert_with_rules(d_all)
d_all <- d_all_wrules$data
cols_cats <- names(d_all_wrules$rules)
d_train <- d_all[1:nrow(d_train)]
d_test <- d_all[(nrow(d_train)+1):(nrow(d_train)+nrow(d_test))]
p <- ncol(d_all)-1
dlgb_train <- lgb.Dataset(data = as.matrix(d_train[,1:p]), label = d_train$dep_delayed_15min)
cat(system.time({
md <- lgb.train(data = dlgb_train,
objective = "binary",
nrounds = 100, num_leaves = 512, learning_rate = 0.1,
categorical_feature = cols_cats,
verbose = 0)
})[[3]]," ",sep="")
phat <- predict(md, data = as.matrix(d_test[,1:p]))
rocr_pred <- prediction(phat, d_test$dep_delayed_15min)
cat(performance(rocr_pred, "auc")@y.values[[1]],"\n")
and for GPU:
md <- lgb.train(data = dlgb_train,
objective = "binary",
nrounds = 100, num_leaves = 512, learning_rate = 0.1,
categorical_feature = cols_cats,
device = "gpu",
verbose = 0)
I changed the benchmark (Dockerfiles, results in README) to use lightgbm with cat.enc. instead of OHE
Regarding the GPU, @guolinke I have some analysis of the GPU utilization patterns for all the 4 libs, see here:
https://github.com/szilard/GBM-perf/issues/11
and summary here:
https://github.com/szilard/GBM-perf#gpu-utilization-patterns
Thank you so much @szilard . I think LightGBM may cannot utilize this data in GPU.
it is even slower than LightGBM cpu.
I receive some feedbacks from LightGBM users recently, and they report that GPU version is faster.