szilard / GBM-perf

Performance of various open source GBM implementations
MIT License
213 stars 28 forks source link

LightGBM weird multi-core scaling #43

Closed szilard closed 3 years ago

szilard commented 3 years ago

On 10M rows c5.metal lightgbm is 2.5x faster on 2 cores vs 1 core. Any idea @guolinke why?

10:lightgbm:1:0::210.265:0.792273
10:lightgbm:1:0::210.205:0.792273
10:lightgbm:1:0::210.163:0.792273
...
10:lightgbm:2:0-1::84.973:0.792273
10:lightgbm:2:0-1::85.043:0.792273
10:lightgbm:2:0-1::84.996:0.792273

so 1 core ~219 sec, 2 cores ~85 sec!

suppressMessages({
library(data.table)
library(ROCR)
library(lightgbm)
library(Matrix)
})

set.seed(123)

d_train <- fread("train.csv", showProgress=FALSE)
d_test <- fread("test.csv", showProgress=FALSE)

d_all <- rbind(d_train, d_test)
d_all$dep_delayed_15min <- ifelse(d_all$dep_delayed_15min=="Y",1,0)

d_all_wrules <- lgb.convert_with_rules(d_all)       
d_all <- d_all_wrules$data
cols_cats <- names(d_all_wrules$rules) 

d_train <- d_all[1:nrow(d_train)]
d_test <- d_all[(nrow(d_train)+1):(nrow(d_train)+nrow(d_test))]

p <- ncol(d_all)-1
dlgb_train <- lgb.Dataset(data = as.matrix(d_train[,1:p]), label = d_train$dep_delayed_15min)

cat(system.time({
  md <- lgb.train(data = dlgb_train, 
            objective = "binary", 
            nrounds = 100, num_leaves = 512, learning_rate = 0.1, 
            categorical_feature = cols_cats,
            verbose = 0)
})[[3]],":",sep="")

phat <- predict(md, data = as.matrix(d_test[,1:p]))
rocr_pred <- prediction(phat, d_test$dep_delayed_15min)
cat(performance(rocr_pred, "auc")@y.values[[1]],"\n")
RUN wget https://s3.amazonaws.com/benchm-ml--main/train-0.1m.csv && \
    wget https://s3.amazonaws.com/benchm-ml--main/train-1m.csv && \
    wget https://s3.amazonaws.com/benchm-ml--main/train-10m.csv && \
    wget https://s3.amazonaws.com/benchm-ml--main/test.csv
guolinke commented 3 years ago

@szilard in 3.0.0 version, LightGBM implements 2 different algorithms for tree learning. one is better for single-thread, another one is good for multi-thread. It will have a small test for these two before training, and choose the faster one. So it is possible. If you output the logs of LightGBM [Warning], there will be information about the chosen algorithms.

szilard commented 3 years ago

Thank you so much @guolinke for solving the "mystery". It was also weird that I saw this behavior only on c5 but not on r4 instances for example. Now it's all clear, thanks for clarifying. 💯