szilard / GBM-perf

Performance of various open source GBM implementations
MIT License
213 stars 28 forks source link

[GPU] Result of LightGBM GPU seems strange #37

Closed guolinke closed 3 years ago

guolinke commented 3 years ago

it is even slower than LightGBM cpu.

I receive some feedbacks from LightGBM users recently, and they report that GPU version is faster.

szilard commented 3 years ago

Not sure, maybe you can ask your users to re-run my code (I have a Dockerfile), run it on other GPUs or look at the code and suggest improvements.

One thing I noticed is that I was using OHE in the benchmark for lightgbm. I added code using the categorical_feature = ... thing and that speeds up a little bit on GPU (for large data):

Screen Shot 2020-09-12 at 9 19 35 AM

but on CPU OHE and cat.enc. runs at the same speed:

Screen Shot 2020-09-12 at 9 20 44 AM

Code and details here:

szilard commented 3 years ago

My setup is:


RUN git clone --recursive && \
    cd LightGBM && Rscript build_r.R


RUN apt-get install -y libboost-dev libboost-system-dev libboost-filesystem-dev ocl-icd-opencl-dev opencl-headers clinfo
RUN mkdir -p /etc/OpenCL/vendors && \
    echo "" > /etc/OpenCL/vendors/nvidia.icd   ## otherwise lightgm segfaults at runtime (compiles fine without it)
RUN git clone --recursive && \
    cd LightGBM && sed -i "s/use_gpu <- FALSE/use_gpu <- TRUE/"  R-package/src/install.libs.R && Rscript build_r.R

on EC2 (CPU: r4.8xlarge // GPU: p3.8xlarge with Tesla V100) with Ubuntu 20.04, CUDA 11.0, R 4.0

and code:



d_train <- fread("train.csv", showProgress=FALSE)
d_test <- fread("test.csv", showProgress=FALSE)

d_all <- rbind(d_train, d_test)
d_all$dep_delayed_15min <- ifelse(d_all$dep_delayed_15min=="Y",1,0)

d_all_wrules <- lgb.convert_with_rules(d_all)       
d_all <- d_all_wrules$data
cols_cats <- names(d_all_wrules$rules) 

d_train <- d_all[1:nrow(d_train)]
d_test <- d_all[(nrow(d_train)+1):(nrow(d_train)+nrow(d_test))]

p <- ncol(d_all)-1
dlgb_train <- lgb.Dataset(data = as.matrix(d_train[,1:p]), label = d_train$dep_delayed_15min)

  md <- lgb.train(data = dlgb_train, 
            objective = "binary", 
            nrounds = 100, num_leaves = 512, learning_rate = 0.1, 
            categorical_feature = cols_cats,
            verbose = 0)
})[[3]]," ",sep="")

phat <- predict(md, data = as.matrix(d_test[,1:p]))
rocr_pred <- prediction(phat, d_test$dep_delayed_15min)
cat(performance(rocr_pred, "auc")@y.values[[1]],"\n")

and for GPU:

  md <- lgb.train(data = dlgb_train, 
            objective = "binary", 
            nrounds = 100, num_leaves = 512, learning_rate = 0.1, 
            categorical_feature = cols_cats,
            device = "gpu",
            verbose = 0)
szilard commented 3 years ago

I changed the benchmark (Dockerfiles, results in README) to use lightgbm with cat.enc. instead of OHE

szilard commented 3 years ago

Regarding the GPU, @guolinke I have some analysis of the GPU utilization patterns for all the 4 libs, see here:

and summary here:

guolinke commented 3 years ago

Thank you so much @szilard . I think LightGBM may cannot utilize this data in GPU.