ghost commented 4 years ago

Hi everyone,

I'm using lightgbm package in my Windows 10 computer, RStudio 3.6.3 version. I also asked how can I install lightgbm for R a few weeks ago then it was initiliazed successfuly (I guess)with help but when I run the given lightgbm data(bank dataset #887) it gives me a model; type: NULL length:0 size:0B value:NULL(empty). Also, I run lightgbm for my 10 years monthly univariate time series data, it also gives same output, empty. I don't know the reason for this issue, may it have been installed incorrectly or something went wrong?

Thank you

jameslamb commented 4 years ago

@mgzde Can you please provide the following information:

How did you install the LightGBM R package? Descibe this in as much detail as possible, including the version or commit hash used.
What is the exact code you ran?

ghost commented 4 years ago

I've installed git and CMake(3.16.6) I've installed Visual Studio Code (April 2020 version) (Actually, I want to use lightgbm in RStudio, so I have RStudio 3.6.3) I've also installed RTools35 I've added C:\Program Files\Rtools\bin to path Then I've run the following code git clone --recursive https://github.com/microsoft/LightGBM cd LightGBM Rscript build_r.R Ekran Görüntüsü (73) Ekran Görüntüsü (74) Ekran Görüntüsü (75) Ekran Görüntüsü (76) Ekran Görüntüsü (77)

Then I've tried in R with the following codes and outputs for agaricus data; Ekran Görüntüsü (111) Ekran Görüntüsü (112) Ekran Görüntüsü (113) Ekran Görüntüsü (114) Ekran Görüntüsü (115) Ekran Görüntüsü (116) Ekran Görüntüsü (117) Ekran Görüntüsü (118) Ekran Görüntüsü (119) at the end it says ; 'There is no applicable method for predict '(applied to an object of class "NULL") with "'predict' için uygulanabilir bir metod yok ("NULL" sınıfının bir nesnesine uygulanan)"

After that I've tried with bank dataset; Ekran Görüntüsü (120) Ekran Görüntüsü (121) Ekran Görüntüsü (122) And it gives me a error such that; Error in lgb.dump(model, num_iteration = 1) : lgb.save: booster should be an ‘lgb.Booster’ Also, it's model information is; Ekran Görüntüsü (124)_LI

jameslamb commented 4 years ago

Thank you @mgzde , I will take a look in a few hours and see if I can replicate this.

In the future, please paste logs as text and code in code blocks instead of screenshots. That makes it easier for me to copy your code and run it (with screenshots, I will have to re-type it by hand), and makes it more likely that other people who face the same issue as you will find this issue from search engines.

If you've never used the formatting options in GitHub comments, you might explore https://help.github.com/en/github/writing-on-github/basic-writing-and-formatting-syntax

ghost commented 4 years ago

I'm so sorry for this. I'm really new about github so I do not know the formattings. I can write the codes again as text instead of screenshot now.

ghost commented 4 years ago

`#LIGHTGBM library(lightgbm) library(methods)

We load in the agaricus dataset

In this example, we are aiming to predict whether a mushroom is edible

data(agaricus.train, package = "lightgbm") data(agaricus.test, package = "lightgbm") train <- agaricus.train test <- agaricus.test

The loaded data is stored in sparseMatrix, and label is a numeric vector in {0,1}

class(train$label) class(train$data)

--------------------Basic Training using lightgbm----------------

This is the basic usage of lightgbm you can put matrix in data field

Note: we are putting in sparse matrix here, lightgbm naturally handles sparse input

Use sparse matrix when your feature is sparse (e.g. when you are using one-hot encoding vector)

print("Training lightgbm with sparseMatrix") bst <- lightgbm( data = train$data , label = train$label , num_leaves = 4L , learning_rate = 1.0 , nrounds = 2L , objective = "binary" )

Alternatively, you can put in dense matrix, i.e. basic R-matrix

print("Training lightgbm with Matrix") bst <- lightgbm( data = as.matrix(train$data) , label = train$label , num_leaves = 4L , learning_rate = 1.0 , nrounds = 2L , objective = "binary" )

You can also put in lgb.Dataset object, which stores label, data and other meta datas needed for advanced features

print("Training lightgbm with lgb.Dataset") dtrain <- lgb.Dataset( data = train$data , label = train$label ) bst <- lightgbm( data = dtrain , num_leaves = 4L , learning_rate = 1.0 , nrounds = 2L , objective = "binary" )

Verbose = 0,1,2

print("Train lightgbm with verbose 0, no message") bst <- lightgbm( data = dtrain , num_leaves = 4L , learning_rate = 1.0 , nrounds = 2L , objective = "binary" , verbose = 0L )

print("Train lightgbm with verbose 1, print evaluation metric") bst <- lightgbm( data = dtrain , num_leaves = 4L , learning_rate = 1.0 , nrounds = 2L , nthread = 2L , objective = "binary" , verbose = 1L )

print("Train lightgbm with verbose 2, also print information about tree") bst <- lightgbm( data = dtrain , num_leaves = 4L , learning_rate = 1.0 , nrounds = 2L , nthread = 2L , objective = "binary" , verbose = 2L )

You can also specify data as file path to a LibSVM/TCV/CSV format input

Since we do not have this file with us, the following line is just for illustration

bst <- lightgbm(

data = "agaricus.train.svm"

, num_leaves = 4L

, learning_rate = 1.0

, nrounds = 2L

, objective = "binary"

)

--------------------Basic prediction using lightgbm--------------

You can do prediction using the following line

You can put in Matrix, sparseMatrix, or lgb.Dataset

pred <- predict(bst, test$data) err <- mean(as.numeric(pred > 0.5) != test$label) print(paste("test-error=", err))

--------------------Save and load models-------------------------

Save model to binary local file

lgb.save(bst, "lightgbm.model")

Load binary model to R

bst2 <- lgb.load("lightgbm.model") pred2 <- predict(bst2, test$data)

pred2 should be identical to pred

print(paste("sum(abs(pred2-pred))=", sum(abs(pred2 - pred))))

--------------------Advanced features ---------------------------

To use advanced features, we need to put data in lgb.Dataset

dtrain <- lgb.Dataset(data = train$data, label = train$label, free_raw_data = FALSE) dtest <- lgb.Dataset.create.valid(dtrain, data = test$data, label = test$label)

--------------------Using validation set-------------------------

valids is a list of lgb.Dataset, each of them is tagged with name

valids <- list(train = dtrain, test = dtest)

To train with valids, use lgb.train, which contains more advanced features

valids allows us to monitor the evaluation result on all data in the list

print("Train lightgbm using lgb.train with valids") bst <- lgb.train( data = dtrain , num_leaves = 4L , learning_rate = 1.0 , nrounds = 2L , valids = valids , nthread = 2L , objective = "binary" )

We can change evaluation metrics, or use multiple evaluation metrics

print("Train lightgbm using lgb.train with valids, watch logloss and error") bst <- lgb.train( data = dtrain , num_leaves = 4L , learning_rate = 1.0 , nrounds = 2L , valids = valids , eval = c("binary_error", "binary_logloss") , nthread = 2L , objective = "binary" )

lgb.Dataset can also be saved using lgb.Dataset.save

lgb.Dataset.save(dtrain, "dtrain.buffer")

To load it in, simply call lgb.Dataset

dtrain2 <- lgb.Dataset("dtrain.buffer") bst <- lgb.train( data = dtrain2 , num_leaves = 4L , learning_rate = 1.0 , nrounds = 2L , valids = valids , nthread = 2L , objective = "binary" )

information can be extracted from lgb.Dataset using getinfo

label <- getinfo(dtest, "label") pred <- predict(bst, test$data) err <- as.numeric(sum(as.integer(pred > 0.5) != label)) / length(label) print(paste("test-error=", err))`

at the end it says ; 'There is no applicable method for predict '(applied to an object of class "NULL") with "'predict' için uygulanabilir bir metod yok ("NULL" sınıfının bir nesnesine uygulanan)"

After that I've tried in bank dataset;

`library(lightgbm) library(data.table) data(bank, package = "lightgbm") str(bank)

We are dividing the dataset into two: one train, one validation

bank_train <- bank[1:4000, ] bank_test <- bank[4001:4521, ] head(bank_train) head(bank_test)

bank_rules <- lgb.prepare_rules(data = bank_train) bank_train <- bank_rules$data bank_test <- lgb.prepare_rules(data = bank_test, rules = bank_rules$rules)$data str(bank_test)

Remove 1 to label because it must be between 0 and 1

bank_train$y <- bank_train$y - 1 bank_test$y <- bank_test$y - 1

Data input to LightGBM must be a matrix, without the label

my_data_train <- as.matrix(bank_train[, 1:16, with = FALSE]) my_data_test <- as.matrix(bank_test[, 1:16, with = FALSE])

Creating the LightGBM dataset with categorical features

The categorical features can be passed to lgb.train to not copy and paste a lot

dtrain <- lgb.Dataset(data = my_data_train, label = bank_train$y) dtest <- lgb.Dataset(data = my_data_test, label = bank_test$y)

We can now train a model

model <- lgb.train(list(objective = "binary", metric = "l2", min_data = 1, learning_rate = 0.1, min_data = 0, min_hessian = 1, max_depth = 2, categorical_feature = c(2, 3, 4, 5, 7, 8, 9, 11, 16)), dtrain, 100, valids = list(train = dtrain, valid = dtest))

Try to find split_feature: 11

If you find it, it means it used a categorical feature in the first tree

lgb.dump(model, num_iteration = 1)

Remove 1 to label because it must be between 0 and 1

bank$y <- bank$y - 1

Data input to LightGBM must be a matrix, without the label

my_data <- as.matrix(bank[, 1:16, with = FALSE])

Creating the LightGBM dataset with categorical features

The categorical features must be indexed like in R (1-indexed, not 0-indexed)

lgb_data <- lgb.Dataset(data = my_data,

label = bank$y,

                    categorical_feature = c(2, 3, 4, 5, 7, 8, 9, 11, 16))

We can now train a model

model <- lgb.train(list(objective = "binary", metric = "l2", min_data = 1, learning_rate = 0.1, min_data = 0, min_hessian = 1, max_depth = 2), lgb_data, 100, valids = list(train = lgb_data))

Try to find split_feature: 2

If you find it, it means it used a categorical feature in the first tree

lgb.dump(model, num_iteration = 1) ` And it gives me a error such that; Error in lgb.dump(model, num_iteration = 1) : lgb.save: booster should be an ‘lgb.Booster’

Also, it's model information is; Type: NULL Length: 0 Size: 0 B Value: NULL (empty)

jameslamb commented 4 years ago

No problem @mgzde ! Thanks for copying in the code. I really do recommend you read https://help.github.com/en/github/writing-on-github/basic-writing-and-formatting-syntax when you have the time.

the # has a special meaning that is causing some of your code comments above to look like titles. I copied your comment and reformatted it below the line here. If you click ... then Edit on my comment, you'll be able to see the raw text I typed and how the formatting works.

#LIGHTGBM
library(lightgbm)
library(methods)

### We load in the agaricus dataset
### In this example, we are aiming to predict whether a mushroom is edible
data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
train <- agaricus.train 
test <- agaricus.test 

### The loaded data is stored in sparseMatrix, and label is a numeric vector in {0,1}
class(train$label)
class(train$data)

#--------------------Basic Training using lightgbm----------------
### This is the basic usage of lightgbm you can put matrix in data field
### Note: we are putting in sparse matrix here, lightgbm naturally handles sparse input
### Use sparse matrix when your feature is sparse (e.g. when you are using one-hot encoding vector)
print("Training lightgbm with sparseMatrix")
bst <- lightgbm(
    data = train$data
    , label = train$label
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , objective = "binary"
)

### Alternatively, you can put in dense matrix, i.e. basic R-matrix
print("Training lightgbm with Matrix")
bst <- lightgbm(
    data = as.matrix(train$data)
    , label = train$label
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , objective = "binary"
)

### You can also put in lgb.Dataset object, which stores label, data and other meta datas needed for advanced features
print("Training lightgbm with lgb.Dataset")
dtrain <- lgb.Dataset(
    data = train$data
    , label = train$label
)
bst <- lightgbm(
    data = dtrain
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , objective = "binary"
)

### Verbose = 0,1,2
print("Train lightgbm with verbose 0, no message")
bst <- lightgbm(
    data = dtrain
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , objective = "binary"
    , verbose = 0L
)

print("Train lightgbm with verbose 1, print evaluation metric")
bst <- lightgbm(
    data = dtrain
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , nthread = 2L
    , objective = "binary"
    , verbose = 1L
)

print("Train lightgbm with verbose 2, also print information about tree")
bst <- lightgbm(
    data = dtrain
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , nthread = 2L
    , objective = "binary"
    , verbose = 2L
)

### You can also specify data as file path to a LibSVM/TCV/CSV format input
### Since we do not have this file with us, the following line is just for illustration
### bst <- lightgbm(
###     data = "agaricus.train.svm"
###     , num_leaves = 4L
###     , learning_rate = 1.0
###     , nrounds = 2L
###     , objective = "binary"
### )

#--------------------Basic prediction using lightgbm--------------
### You can do prediction using the following line
### You can put in Matrix, sparseMatrix, or lgb.Dataset
pred <- predict(bst, test$data)
err <- mean(as.numeric(pred > 0.5) != test$label)
print(paste("test-error=", err))

###--------------------Save and load models-------------------------
# Save model to binary local file
lgb.save(bst, "lightgbm.model")

### Load binary model to R
bst2 <- lgb.load("lightgbm.model")
pred2 <- predict(bst2, test$data)

### pred2 should be identical to pred
print(paste("sum(abs(pred2-pred))=", sum(abs(pred2 - pred))))

#--------------------Advanced features ---------------------------
### To use advanced features, we need to put data in lgb.Dataset
dtrain <- lgb.Dataset(data = train$data, label = train$label, free_raw_data = FALSE)
dtest <- lgb.Dataset.create.valid(dtrain, data = test$data, label = test$label)

#--------------------Using validation set-------------------------
### valids is a list of lgb.Dataset, each of them is tagged with name
valids <- list(train = dtrain, test = dtest)

### To train with valids, use lgb.train, which contains more advanced features
### valids allows us to monitor the evaluation result on all data in the list
print("Train lightgbm using lgb.train with valids")
bst <- lgb.train(
    data = dtrain
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , valids = valids
    , nthread = 2L
    , objective = "binary"
)

### We can change evaluation metrics, or use multiple evaluation metrics
print("Train lightgbm using lgb.train with valids, watch logloss and error")
bst <- lgb.train(
    data = dtrain
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , valids = valids
    , eval = c("binary_error", "binary_logloss")
    , nthread = 2L
    , objective = "binary"
)

### lgb.Dataset can also be saved using lgb.Dataset.save
lgb.Dataset.save(dtrain, "dtrain.buffer")

### To load it in, simply call lgb.Dataset
dtrain2 <- lgb.Dataset("dtrain.buffer")
bst <- lgb.train(
    data = dtrain2
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , valids = valids
    , nthread = 2L
    , objective = "binary"
)

### information can be extracted from lgb.Dataset using getinfo
label <- getinfo(dtest, "label")
pred <- predict(bst, test$data)
err <- as.numeric(sum(as.integer(pred > 0.5) != label)) / length(label)
print(paste("test-error=", err))`

at the end, it says:

'There is no applicable method for predict '(applied to an object of class "NULL") with "'predict' için uygulanabilir bir metod yok ("NULL" sınıfının bir nesnesine uygulanan)"***

After that I've tried in bank dataset;

library(lightgbm)
library(data.table)
data(bank, package = "lightgbm")
str(bank)

### We are dividing the dataset into two: one train, one validation
bank_train <- bank[1:4000, ]
bank_test <- bank[4001:4521, ]
head(bank_train)
head(bank_test)

bank_rules <- lgb.prepare_rules(data = bank_train)
bank_train <- bank_rules$data
bank_test <- lgb.prepare_rules(data = bank_test, rules = bank_rules$rules)$data
str(bank_test)

### Remove 1 to label because it must be between 0 and 1
bank_train$y <- bank_train$y - 1
bank_test$y <- bank_test$y - 1

### Data input to LightGBM must be a matrix, without the label
my_data_train <- as.matrix(bank_train[, 1:16, with = FALSE])
my_data_test <- as.matrix(bank_test[, 1:16, with = FALSE])

### Creating the LightGBM dataset with categorical features
### The categorical features can be passed to lgb.train to not copy and paste a lot
dtrain <- lgb.Dataset(data = my_data_train,
                      label = bank_train$y)
dtest <- lgb.Dataset(data = my_data_test,
                     label = bank_test$y)

# We can now train a model
model <- lgb.train(list(objective = "binary",
                        metric = "l2",
                        min_data = 1,
                        learning_rate = 0.1,
                        min_data = 0,
                        min_hessian = 1,
                        max_depth = 2,
                        categorical_feature = c(2, 3, 4, 5, 7, 8, 9, 11, 16)),
                   dtrain,
                   100,
                   valids = list(train = dtrain, valid = dtest))

### Try to find split_feature: 11
### If you find it, it means it used a categorical feature in the first tree
lgb.dump(model, num_iteration = 1)

### Remove 1 to label because it must be between 0 and 1
bank$y <- bank$y - 1

### Data input to LightGBM must be a matrix, without the label
my_data <- as.matrix(bank[, 1:16, with = FALSE])

### Creating the LightGBM dataset with categorical features
### The categorical features must be indexed like in R (1-indexed, not 0-indexed)
lgb_data <- lgb.Dataset(data = my_data,
                        #label = bank$y,
                        categorical_feature = c(2, 3, 4, 5, 7, 8, 9, 11, 16))

### We can now train a model
model <- lgb.train(list(objective = "binary",
                        metric = "l2",
                        min_data = 1,
                        learning_rate = 0.1,
                        min_data = 0,
                        min_hessian = 1,
                        max_depth = 2),
                   lgb_data,
                   100,
                   valids = list(train = lgb_data))

### Try to find split_feature: 2
### If you find it, it means it used a categorical feature in the first tree
lgb.dump(model, num_iteration = 1)

And it gives me a error such that:

Error in lgb.dump(model, num_iteration = 1) : lgb.save: booster should be an ‘lgb.Booster’

Also, it's model information is

Type: NULL
Length: 0
Size: 0 B
Value: NULL (empty)***

jameslamb commented 4 years ago

I'll try to reproduce this tonight and let you know what I find! Thanks again for using LightGBM and reporting the issue.

jameslamb commented 4 years ago

Sorry for the delay! I was able to test out this code today. All details of my investigation are given below.

For the first block of code using agaricus: I was not able to reproduce the issue you are seeing. Please try re-running this code from a clean R session..

For the second block of using bank: The code beginning with # Try to find split_feature: 11 is incorrect. It is passing a data.table that still has strings into lgb.Dataset(), which expects only numeric input. That is why you see warnings like "NAs introduced by coercion".

It looks like you copied most of the bank code from the categorical features demo in our documentation, but missed an important line:

bank <- lgb.prepare(data = bank)

That code takes in a data.table / data.frame that could have character or factor columns, and makes sure that you get back one with only numeric columns.

Please let me know if this solves the issue. If it does not. please try to reproduce the problems you're seeing with a smaller amount of code, in a new R session.

Actually, I want to use lightgbm in RStudio, so I have RStudio 3.6.3

I assumed that this means you're using R versions 3.6.3. RStudio's most recent version is 1.3.959.

My environment for testing the code you provided:

Windows 10
R 3.6.3
CMake 3.16.5
Visual Studio 2019
LightGBM cloned from master as of https://github.com/microsoft/LightGBM/commit/ce95d9ca8aab8fd007a2682d4cc80157b4e1b801

Installation Results

To install, I ran

git clone --recursive https://github.com/microsoft/LightGBM
cd LightGBM
Rscript build_r.R

Installation succeeded.

install logs (click me)

``` * checking for file 'C:/Users/James/repos/LightGBM/lightgbm_r/DESCRIPTION' ... OK * preparing 'lightgbm': * checking DESCRIPTION meta-information ... OK * cleaning src * checking for LF line-endings in source and make files and shell scripts * checking for empty or unneeded directories WARNING: directory 'lightgbm/src/compute' is empty * looking to see if a 'data/datalist' file should be added * building 'lightgbm_2.3.2.tar.gz' * installing to library 'C:/Users/James/Documents/R/win-library/3.6' * installing *source* package 'lightgbm' ... ** using staged installation ** libs installing via 'install.libs.R' to C:/Users/James/Documents/R/win-library/3.6/00LOCK-lightgbm/00new/lightgbm Trying 'Visual Studio 16 2019' -- Selecting Windows SDK version 10.0.18362.0 to target Windows 10.0.17763. -- The C compiler identification is MSVC 19.24.28319.0 -- The CXX compiler identification is MSVC 19.24.28319.0 -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.24.28314/bin/Hostx64/x64/cl.exe -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.24.28314/bin/Hostx64/x64/cl.exe -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.24.28314/bin/Hostx64/x64/cl.exe -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.24.28314/bin/Hostx64/x64/cl.exe -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- R version passed into FindLibR.cmake: 3.6.1 -- Creating R.lib and R.def * [C:/Program Files/R/R-3.6.1/bin/x64/R.dll] Found PE+ image -- Found LibR: C:/Program Files/R/R-3.6.1 -- LIBR_EXECUTABLE: C:/Program Files/R/R-3.6.1/bin/x64/R.exe -- LIBR_INCLUDE_DIRS: C:/Program Files/R/R-3.6.1/include -- LIBR_CORE_LIBRARY: C:/Program Files/R/R-3.6.1/bin/x64/R.dll -- Found OpenMP_C: -openmp (found version "2.0") -- Found OpenMP_CXX: -openmp (found version "2.0") -- Found OpenMP: TRUE (found version "2.0") -- Performing Test MM_PREFETCH -- Performing Test MM_PREFETCH - Failed -- Performing Test MM_MALLOC -- Performing Test MM_MALLOC - Failed -- Configuring done -- Generating done -- Build files have been written to: C:/Users/James/AppData/Local/Temp/RtmpQZCrY4/R.INSTALL338c2ad56f8e/lightgbm/src/build Successfully created build files for 'Visual Studio 16 2019' Microsoft (R) Build Engine version 16.4.0+e901037fe for .NET Framework Copyright (C) Microsoft Corporation. All rights reserved. C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Microsoft\VC\v160\Microsoft.CppBuild.targets(467,5): warning MSB8029: The Intermediate directory or Output directory cannot reside under the Temporary directory as it could lead to issues with incremental build. [C:\Users\James\AppData\Local\Temp\RtmpQZCrY4\R.INSTALL338c2ad56f8e\lightgbm\src\build\ZERO_CHECK.vcxproj] Checking Build System C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Microsoft\VC\v160\Microsoft.CppBuild.targets(467,5): warning MSB8029: The Intermediate directory or Output directory cannot reside under the Temporary directory as it could lead to issues with incremental build. [C:\Users\James\AppData\Local\Temp\RtmpQZCrY4\R.INSTALL338c2ad56f8e\lightgbm\src\build\_lightgbm.vcxproj] Building Custom Rule C:/Users/James/AppData/Local/Temp/RtmpQZCrY4/R.INSTALL338c2ad56f8e/lightgbm/src/CMakeLists.txt application.cpp boosting.cpp gbdt.cpp gbdt_model_text.cpp gbdt_prediction.cpp prediction_early_stop.cpp bin.cpp config.cpp config_auto.cpp dataset.cpp dataset_loader.cpp file_io.cpp json11.cpp metadata.cpp parser.cpp tree.cpp dcg_calculator.cpp metric.cpp linker_topo.cpp linkers_mpi.cpp linkers_socket.cpp network.cpp objective_function.cpp data_parallel_tree_learner.cpp feature_parallel_tree_learner.cpp gpu_tree_learner.cpp serial_tree_learner.cpp tree_learner.cpp voting_parallel_tree_learner.cpp c_api.cpp lightgbm_R.cpp Creating library C:/Users/James/AppData/Local/Temp/RtmpQZCrY4/R.INSTALL338c2ad56f8e/lightgbm/src/Release/lib_lightgbm.lib and object C:/Users/James/AppData/Local/Temp/RtmpQZCrY4/R.INSTALL338c2ad56f8e/lightgbm/src/Release/lib_lightgbm.exp _lightgbm.vcxproj -> C:\Users\James\AppData\Local\Temp\RtmpQZCrY4\R.INSTALL338c2ad56f8e\lightgbm\src\Release\lib_lightgbm.dll Found library file: C:\Users\James\AppData\Local\Temp\RtmpQZCrY4\R.INSTALL338c2ad56f8e/lightgbm/src/Release/lib_lightgbm.dll to move to C:/Users/James/Documents/R/win-library/3.6/00LOCK-lightgbm/00new/lightgbm/libs/x64 Removing 'build/' directory ** R ** data ** demo ** inst ** byte-compile and prepare package for lazy loading ** help *** installing help indices converting help for package finding HTML links ...'lightgbm' done agaricus.test html agaricus.train html bank html dim html dimnames.lgb.Dataset html getinfo html lgb.Dataset html lgb.Dataset.construct html lgb.Dataset.create.valid html lgb.Dataset.save html lgb.Dataset.set.categorical html lgb.Dataset.set.reference html lgb.cv html lgb.dump html lgb.get.eval.result html lgb.importance html lgb.interprete html lgb.load html lgb.model.dt.tree html lgb.plot.importance html lgb.plot.interpretation html lgb.prepare html lgb.prepare2 html lgb.prepare_rules html lgb.prepare_rules2 html lgb.save html lgb.train html lgb.unloader html lgb_shared_params html lightgbm html predict.lgb.Booster html readRDS.lgb.Booster html saveRDS.lgb.Booster html setinfo html slice html ** building package indices ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (lightgbm) ```

Testing sample code with `agaricus`

I ran all of the following code exactly in order, in a single new R session.

library(lightgbm)
library(methods)

### In this example, we are aiming to predict whether a mushroom is edible
data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
train <- agaricus.train 
test <- agaricus.test 

### The loaded data is stored in sparseMatrix, and label is a numeric vector in {0,1}
class(train$label)
class(train$data)

#--------------------Basic Training using lightgbm----------------
### This is the basic usage of lightgbm you can put matrix in data field
### Note: we are putting in sparse matrix here, lightgbm naturally handles sparse input
### Use sparse matrix when your feature is sparse (e.g. when you are using one-hot encoding vector)
print("Training lightgbm with sparseMatrix")
bst <- lightgbm(
    data = train$data
    , label = train$label
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , objective = "binary"
)

This succeeded.

logs for this step

> [LightGBM] [Info] Number of positive: 3140, number of negative: 3373 [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000853 seconds. You can set `force_row_wise=true` to remove the overhead. And if memory is not enough, you can set `force_col_wise=true`. [LightGBM] [Info] Total Bins 214 [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 107 [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580 [LightGBM] [Info] Start training from score -0.071580 [1]: train's binary_logloss:0.198597 [2]: train's binary_logloss:0.111535

### Alternatively, you can put in dense matrix, i.e. basic R-matrix
print("Training lightgbm with Matrix")
bst <- lightgbm(
    data = as.matrix(train$data)
    , label = train$label
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , objective = "binary"
)

This also succeeded.

logs for this step

> [LightGBM] [Info] Number of positive: 3140, number of negative: 3373 [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000706 seconds. You can set `force_row_wise=true` to remove the overhead. And if memory is not enough, you can set `force_col_wise=true`. [LightGBM] [Info] Total Bins 214 [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 107 [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580 [LightGBM] [Info] Start training from score -0.071580 [1]: train's binary_logloss:0.198597 [2]: train's binary_logloss:0.111535

### You can also put in lgb.Dataset object, which stores label, data and other meta datas needed for advanced features
print("Training lightgbm with lgb.Dataset")
dtrain <- lgb.Dataset(
    data = train$data
    , label = train$label
)
bst <- lightgbm(
    data = dtrain
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , objective = "binary"
)

Succeeded.

logs for this step

> [LightGBM] [Info] Number of positive: 3140, number of negative: 3373 [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001091 seconds. You can set `force_row_wise=true` to remove the overhead. And if memory is not enough, you can set `force_col_wise=true`. [LightGBM] [Info] Total Bins 214 [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 107 [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580 [LightGBM] [Info] Start training from score -0.071580 [1]: train's binary_logloss:0.198597 [2]: train's binary_logloss:0.111535

### Verbose = 0,1,2
print("Train lightgbm with verbose 0, no message")
bst <- lightgbm(
    data = dtrain
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , objective = "binary"
    , verbose = 0L
)

logs for this step

> [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.004189 seconds. You can set `force_row_wise=true` to remove the overhead. And if memory is not enough, you can set `force_col_wise=true`.

print("Train lightgbm with verbose 1, print evaluation metric")
bst <- lightgbm(
    data = dtrain
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , nthread = 2L
    , objective = "binary"
    , verbose = 1L
)

print("Train lightgbm with verbose 2, also print information about tree")
bst <- lightgbm(
    data = dtrain
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , nthread = 2L
    , objective = "binary"
    , verbose = 2L
)

Succeeded.

logs for this step

> [1] "Train lightgbm with verbose 1, print evaluation metric" > [LightGBM] [Info] Number of positive: 3140, number of negative: 3373 [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000716 seconds. You can set `force_row_wise=true` to remove the overhead. And if memory is not enough, you can set `force_col_wise=true`. [LightGBM] [Info] Total Bins 214 [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 107 [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580 [LightGBM] [Info] Start training from score -0.071580 [1]: train's binary_logloss:0.198597 [2]: train's binary_logloss:0.111535 > [LightGBM] [Info] Number of positive: 3140, number of negative: 3373 [LightGBM] [Debug] Dataset::GetMultiBinFromSparseFeatures: sparse rate 0.930600 [LightGBM] [Debug] Dataset::GetMultiBinFromAllFeatures: sparse rate 0.433362 [LightGBM] [Debug] init for col-wise cost 0.008042 seconds, init for row-wise cost 0.009209 seconds [LightGBM] [Debug] col-wise cost 0.000202 seconds, row-wise cost 0.000098 seconds [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008341 seconds. You can set `force_row_wise=true` to remove the overhead. And if memory is not enough, you can set `force_col_wise=true`. [LightGBM] [Debug] Using Sparse Multi-Val Bin [LightGBM] [Info] Total Bins 214 [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 107 [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580 [LightGBM] [Info] Start training from score -0.071580 [LightGBM] [Debug] Trained a tree with leaves = 4 and max_depth = 3 [1]: train's binary_logloss:0.198597 [LightGBM] [Debug] Trained a tree with leaves = 4 and max_depth = 3 [2]: train's binary_logloss:0.111535

#--------------------Basic prediction using lightgbm--------------
### You can do prediction using the following line
### You can put in Matrix, sparseMatrix, or lgb.Dataset
pred <- predict(bst, test$data)
err <- mean(as.numeric(pred > 0.5) != test$label)
print(paste("test-error=", err))

Succeeded

logs for this step

> [1] "test-error= 0.0217256362507759"

###--------------------Save and load models-------------------------
# Save model to binary local file
lgb.save(bst, "lightgbm.model")

### Load binary model to R
bst2 <- lgb.load("lightgbm.model")
pred2 <- predict(bst2, test$data)

### pred2 should be identical to pred
print(paste("sum(abs(pred2-pred))=", sum(abs(pred2 - pred))))

Succeeded.

logs for this step

> [1] "sum(abs(pred2-pred))= 0"

#--------------------Advanced features ---------------------------
### To use advanced features, we need to put data in lgb.Dataset
dtrain <- lgb.Dataset(data = train$data, label = train$label, free_raw_data = FALSE)
dtest <- lgb.Dataset.create.valid(dtrain, data = test$data, label = test$label)

#--------------------Using validation set-------------------------
### valids is a list of lgb.Dataset, each of them is tagged with name
valids <- list(train = dtrain, test = dtest)

### To train with valids, use lgb.train, which contains more advanced features
### valids allows us to monitor the evaluation result on all data in the list
print("Train lightgbm using lgb.train with valids")
bst <- lgb.train(
    data = dtrain
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , valids = valids
    , nthread = 2L
    , objective = "binary"
)

### We can change evaluation metrics, or use multiple evaluation metrics
print("Train lightgbm using lgb.train with valids, watch logloss and error")
bst <- lgb.train(
    data = dtrain
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , valids = valids
    , eval = c("binary_error", "binary_logloss")
    , nthread = 2L
    , objective = "binary"
)

Succeeded

logs for this step

> [LightGBM] [Info] Number of positive: 3140, number of negative: 3373 [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000675 seconds. You can set `force_row_wise=true` to remove the overhead. And if memory is not enough, you can set `force_col_wise=true`. [LightGBM] [Info] Total Bins 214 [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 107 [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580 [LightGBM] [Info] Start training from score -0.071580 [1]: train's binary_logloss:0.198597 test's binary_logloss:0.204754 [2]: train's binary_logloss:0.111535 test's binary_logloss:0.113096 > [LightGBM] [Info] Number of positive: 3140, number of negative: 3373 [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001074 seconds. You can set `force_row_wise=true` to remove the overhead. And if memory is not enough, you can set `force_col_wise=true`. [LightGBM] [Info] Total Bins 214 [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 107 [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580 [LightGBM] [Info] Start training from score -0.071580 [1]: train's binary_error:0.0304007 train's binary_logloss:0.198597 test's binary_error:0.0335196 test's binary_logloss:0.204754 [2]: train's binary_error:0.0222632 train's binary_logloss:0.111535 test's binary_error:0.0217256 test's binary_logloss:0.113096

### lgb.Dataset can also be saved using lgb.Dataset.save
lgb.Dataset.save(dtrain, "dtrain.buffer")

### To load it in, simply call lgb.Dataset
dtrain2 <- lgb.Dataset("dtrain.buffer")
bst <- lgb.train(
    data = dtrain2
    , num_leaves = 4L
    , learning_rate = 1.0
    , nrounds = 2L
    , valids = valids
    , nthread = 2L
    , objective = "binary"
)

Succeeded.

logs for this step

> [LightGBM] [Info] Number of positive: 3140, number of negative: 3373 [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000724 seconds. You can set `force_row_wise=true` to remove the overhead. And if memory is not enough, you can set `force_col_wise=true`. [LightGBM] [Info] Total Bins 214 [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 107 [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580 [LightGBM] [Info] Start training from score -0.071580 [1]: train's binary_logloss:0.198597 test's binary_logloss:0.204754 [2]: train's binary_logloss:0.111535 test's binary_logloss:0.113096

label <- getinfo(dtest, "label")
pred <- predict(bst, test$data)
err <- as.numeric(sum(as.integer(pred > 0.5) != label)) / length(label)
print(paste("test-error=", err))

Succeeded.

logs for this step

> [1] "test-error= 0.0217256362507759"

Testing sample code with `bank`

I ran all of the following code exactly in order, in a single new R session.

library(lightgbm)
library(data.table)
data(bank, package = "lightgbm")
str(bank)

Succeeded.

logs for this step

> Classes 'data.table' and 'data.frame': 4521 obs. of 17 variables: $ age : int 30 33 35 30 59 35 36 39 41 43 ... $ job : chr "unemployed" "services" "management" "management" ... $ marital : chr "married" "married" "single" "married" ... $ education: chr "primary" "secondary" "tertiary" "tertiary" ... $ default : chr "no" "no" "no" "no" ... $ balance : int 1787 4789 1350 1476 0 747 307 147 221 -88 ... $ housing : chr "no" "yes" "yes" "yes" ... $ loan : chr "no" "yes" "no" "yes" ... $ contact : chr "cellular" "cellular" "cellular" "unknown" ... $ day : int 19 11 16 3 5 23 14 6 14 17 ... $ month : chr "oct" "may" "apr" "jun" ... $ duration : int 79 220 185 199 226 141 341 151 57 313 ... $ campaign : int 1 1 1 4 1 2 1 2 2 1 ... $ pdays : int -1 339 330 -1 -1 176 330 -1 -1 147 ... $ previous : int 0 4 1 0 0 3 2 0 0 2 ... $ poutcome : chr "unknown" "failure" "failure" "unknown" ... $ y : chr "no" "no" "no" "no" ... - attr(*, ".internal.selfref")=

### We are dividing the dataset into two: one train, one validation
bank_train <- bank[1:4000, ]
bank_test <- bank[4001:4521, ]
head(bank_train)
head(bank_test)

bank_rules <- lgb.prepare_rules(data = bank_train)
bank_train <- bank_rules$data
bank_test <- lgb.prepare_rules(data = bank_test, rules = bank_rules$rules)$data
str(bank_test)

Succeeded.

logs for this step

> age job marital education default balance housing loan contact day 1: 30 unemployed married primary no 1787 no no cellular 19 2: 33 services married secondary no 4789 yes yes cellular 11 3: 35 management single tertiary no 1350 yes no cellular 16 4: 30 management married tertiary no 1476 yes yes unknown 3 5: 59 blue-collar married secondary no 0 yes no unknown 5 6: 35 management single tertiary no 747 no no cellular 23 month duration campaign pdays previous poutcome y 1: oct 79 1 -1 0 unknown no 2: may 220 1 339 4 failure no 3: apr 185 1 330 1 failure no 4: jun 199 4 -1 0 unknown no 5: may 226 1 -1 0 unknown no 6: feb 141 2 176 3 failure no > age job marital education default balance housing loan contact day 1: 53 admin. divorced secondary no 26 yes no cellular 7 2: 36 technician married secondary no 191 no no cellular 31 3: 58 technician divorced secondary no -123 no no cellular 5 4: 26 student single secondary no -147 no no unknown 4 5: 34 technician single secondary no 179 no no cellular 19 6: 55 blue-collar married primary no 1086 yes no cellular 6 month duration campaign pdays previous poutcome y 1: may 56 1 359 1 failure no 2: aug 69 1 -1 0 unknown no 3: aug 131 2 -1 0 unknown no 4: jun 95 2 -1 0 unknown no 5: aug 294 3 -1 0 unknown no 6: may 146 1 272 2 failure no > Classes 'data.table' and 'data.frame': 521 obs. of 17 variables: $ age : int 53 36 58 26 34 55 55 34 41 38 ... $ job : num 1 10 10 9 10 2 2 3 3 4 ... $ marital : num 1 2 1 3 3 2 2 2 1 1 ... $ education: num 2 2 2 2 2 1 2 3 2 2 ... $ default : num 1 1 1 1 1 1 1 1 1 1 ... $ balance : int 26 191 -123 -147 179 1086 471 105 1588 70 ... $ housing : num 2 1 1 1 1 2 2 2 2 1 ... $ loan : num 1 1 1 1 1 1 1 1 2 1 ... $ contact : num 1 1 1 3 1 1 3 3 3 1 ... $ day : int 7 31 5 4 19 6 30 28 20 27 ... $ month : num 9 2 2 7 2 9 9 9 7 11 ... $ duration : int 56 69 131 95 294 146 58 249 10 255 ... $ campaign : int 1 1 2 2 3 1 2 2 8 3 ... $ pdays : int 359 -1 -1 -1 -1 272 -1 -1 -1 148 ... $ previous : int 1 0 0 0 0 2 0 0 0 1 ... $ poutcome : num 1 4 4 4 4 1 4 4 4 3 ... $ y : num 1 1 1 1 1 1 1 1 1 2 ... - attr(*, ".internal.selfref")=

### Remove 1 to label because it must be between 0 and 1
bank_train$y <- bank_train$y - 1
bank_test$y <- bank_test$y - 1

### Data input to LightGBM must be a matrix, without the label
my_data_train <- as.matrix(bank_train[, 1:16, with = FALSE])
my_data_test <- as.matrix(bank_test[, 1:16, with = FALSE])

### Creating the LightGBM dataset with categorical features
### The categorical features can be passed to lgb.train to not copy and paste a lot
dtrain <- lgb.Dataset(data = my_data_train,
                      label = bank_train$y)
dtest <- lgb.Dataset(data = my_data_test,
                     label = bank_test$y)

Succeeded. No logs produced.

# We can now train a model
model <- lgb.train(list(objective = "binary",
                        metric = "l2",
                        min_data = 1,
                        learning_rate = 0.1,
                        min_data = 0,
                        min_hessian = 1,
                        max_depth = 2,
                        categorical_feature = c(2, 3, 4, 5, 7, 8, 9, 11, 16)),
                   dtrain,
                   100,
                   valids = list(train = dtrain, valid = dtest))

logs for this step

> [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN [LightGBM] [Warning] min_data is set=1, min_data=1 will be ignored. Current value: min_data=1 [LightGBM] [Info] Number of positive: 458, number of negative: 3542 [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000279 seconds. You can set `force_row_wise=true` to remove the overhead. And if memory is not enough, you can set `force_col_wise=true`. [LightGBM] [Info] Total Bins 1193 [LightGBM] [Info] Number of data points in the train set: 4000, number of used features: 16 [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.114500 -> initscore=-2.045578 [LightGBM] [Info] Start training from score -2.045578 [1]: train's l2:0.0994419 valid's l2:0.104847 [2]: train's l2:0.0977966 valid's l2:0.103293 [3]: train's l2:0.0964739 valid's l2:0.101909 [4]: train's l2:0.0957967 valid's l2:0.101433 [5]: train's l2:0.0950898 valid's l2:0.100327 [6]: train's l2:0.0945136 valid's l2:0.100013 [7]: train's l2:0.0939542 valid's l2:0.0996662 [8]: train's l2:0.0934401 valid's l2:0.0993189 [9]: train's l2:0.0927487 valid's l2:0.0991435 [10]: train's l2:0.0922085 valid's l2:0.0986014 [11]: train's l2:0.0918153 valid's l2:0.098469 [12]: train's l2:0.0913819 valid's l2:0.0981613 [13]: train's l2:0.0909505 valid's l2:0.0979864 [14]: train's l2:0.0905426 valid's l2:0.0976931 [15]: train's l2:0.0899016 valid's l2:0.0972711 [16]: train's l2:0.0895316 valid's l2:0.0970271 [17]: train's l2:0.089146 valid's l2:0.0967229 [18]: train's l2:0.0886636 valid's l2:0.0960468 [19]: train's l2:0.0883008 valid's l2:0.0955211 [20]: train's l2:0.0878324 valid's l2:0.0952302 [21]: train's l2:0.0874861 valid's l2:0.0947676 [22]: train's l2:0.0872113 valid's l2:0.0945997 [23]: train's l2:0.0869627 valid's l2:0.0945217 [24]: train's l2:0.08655 valid's l2:0.0945738 [25]: train's l2:0.0862541 valid's l2:0.0941872 [26]: train's l2:0.085948 valid's l2:0.0941528 [27]: train's l2:0.0857461 valid's l2:0.0941 [28]: train's l2:0.0855036 valid's l2:0.0939886 [29]: train's l2:0.0852849 valid's l2:0.0939506 [30]: train's l2:0.0850142 valid's l2:0.0935492 [31]: train's l2:0.0848052 valid's l2:0.0933685 [32]: train's l2:0.0845214 valid's l2:0.0930343 [33]: train's l2:0.084331 valid's l2:0.0930592 [34]: train's l2:0.084071 valid's l2:0.0927365 [35]: train's l2:0.0838732 valid's l2:0.0924692 [36]: train's l2:0.0835967 valid's l2:0.0925701 [37]: train's l2:0.0834222 valid's l2:0.0924764 [38]: train's l2:0.0832672 valid's l2:0.0924603 [39]: train's l2:0.0831349 valid's l2:0.0923968 [40]: train's l2:0.0830079 valid's l2:0.0923657 [41]: train's l2:0.0827275 valid's l2:0.0923773 [42]: train's l2:0.0824902 valid's l2:0.0925719 [43]: train's l2:0.0823483 valid's l2:0.092485 [44]: train's l2:0.0822301 valid's l2:0.092534 [45]: train's l2:0.0820868 valid's l2:0.0924691 [46]: train's l2:0.0818943 valid's l2:0.0925296 [47]: train's l2:0.081716 valid's l2:0.0926378 [48]: train's l2:0.0815782 valid's l2:0.0925787 [49]: train's l2:0.0814816 valid's l2:0.0924392 [50]: train's l2:0.0813718 valid's l2:0.0922343 [51]: train's l2:0.0812871 valid's l2:0.0923093 [52]: train's l2:0.0811197 valid's l2:0.0920774 [53]: train's l2:0.0809801 valid's l2:0.0920486 [54]: train's l2:0.0808835 valid's l2:0.0920667 [55]: train's l2:0.0807389 valid's l2:0.0920585 [56]: train's l2:0.0806074 valid's l2:0.0919098 [57]: train's l2:0.0805441 valid's l2:0.0919298 [58]: train's l2:0.080407 valid's l2:0.0919488 [59]: train's l2:0.0803042 valid's l2:0.0919352 [60]: train's l2:0.0801484 valid's l2:0.0917262 [61]: train's l2:0.0799342 valid's l2:0.0917356 [62]: train's l2:0.0797982 valid's l2:0.0917714 [63]: train's l2:0.0797293 valid's l2:0.0916682 [64]: train's l2:0.0796735 valid's l2:0.0917123 [65]: train's l2:0.0794947 valid's l2:0.0917373 [66]: train's l2:0.0793941 valid's l2:0.0915955 [67]: train's l2:0.0793136 valid's l2:0.0915967 [68]: train's l2:0.0790636 valid's l2:0.091608 [69]: train's l2:0.0789973 valid's l2:0.0916736 [70]: train's l2:0.0788705 valid's l2:0.0916671 [71]: train's l2:0.0788168 valid's l2:0.0916386 [72]: train's l2:0.0787269 valid's l2:0.0916457 [73]: train's l2:0.0786086 valid's l2:0.0914741 [74]: train's l2:0.0784935 valid's l2:0.0916397 [75]: train's l2:0.0784566 valid's l2:0.0916249 [76]: train's l2:0.0783135 valid's l2:0.0916652 [77]: train's l2:0.0782109 valid's l2:0.0917218 [78]: train's l2:0.0781334 valid's l2:0.0914983 [79]: train's l2:0.0780579 valid's l2:0.0915497 [80]: train's l2:0.0779979 valid's l2:0.0914593 [81]: train's l2:0.0778809 valid's l2:0.0913358 [82]: train's l2:0.0777102 valid's l2:0.0914007 [83]: train's l2:0.0775018 valid's l2:0.0912009 [84]: train's l2:0.0773735 valid's l2:0.0911884 [85]: train's l2:0.0771901 valid's l2:0.0908108 [86]: train's l2:0.0770799 valid's l2:0.090889 [87]: train's l2:0.0769732 valid's l2:0.0909374 [88]: train's l2:0.0769468 valid's l2:0.0909356 [89]: train's l2:0.0768329 valid's l2:0.0909834 [90]: train's l2:0.0767511 valid's l2:0.0910189 [91]: train's l2:0.0766514 valid's l2:0.0908677 [92]: train's l2:0.0765169 valid's l2:0.0910271 [93]: train's l2:0.0764588 valid's l2:0.0910121 [94]: train's l2:0.0764327 valid's l2:0.091014 [95]: train's l2:0.0763829 valid's l2:0.0910848 [96]: train's l2:0.0762222 valid's l2:0.0909938 [97]: train's l2:0.0761011 valid's l2:0.0910207 [98]: train's l2:0.0760226 valid's l2:0.0911919 [99]: train's l2:0.0759651 valid's l2:0.0911661 [100]: train's l2:0.0758751 valid's l2:0.091211

### Try to find split_feature: 11
### If you find it, it means it used a categorical feature in the first tree
lgb.dump(model, num_iteration = 1)

### Remove 1 to label because it must be between 0 and 1
bank$y <- bank$y - 1

### Data input to LightGBM must be a matrix, without the label
my_data <- as.matrix(bank[, 1:16, with = FALSE])

### Creating the LightGBM dataset with categorical features
### The categorical features must be indexed like in R (1-indexed, not 0-indexed)
lgb_data <- lgb.Dataset(data = my_data,
                        #label = bank$y,
                        categorical_feature = c(2, 3, 4, 5, 7, 8, 9, 11, 16))

### We can now train a model
model <- lgb.train(list(objective = "binary",
                        metric = "l2",
                        min_data = 1,
                        learning_rate = 0.1,
                        min_data = 0,
                        min_hessian = 1,
                        max_depth = 2),
                   lgb_data,
                   100,
                   valids = list(train = lgb_data))

### Try to find split_feature: 2
### If you find it, it means it used a categorical feature in the first tree
lgb.dump(model, num_iteration = 1)

This failed, but with different issues than the one you reported. I think the problem is that bank at this point in your code is not a numeric matrix...it is a dataframe with some character columns!

bank$y <- bank$y - 1 Error in bank$y - 1 : non-numeric argument to binary operator

ghost commented 4 years ago

Hi again,

I've tried codes for both agaricus and bank dataset according to your comments then it worked and I did not get any error messages.

Thank you

jameslamb commented 4 years ago

great!

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

microsoft / LightGBM

[R-package] null model when running demo code #3145

We load in the agaricus dataset

In this example, we are aiming to predict whether a mushroom is edible

The loaded data is stored in sparseMatrix, and label is a numeric vector in {0,1}

--------------------Basic Training using lightgbm----------------

This is the basic usage of lightgbm you can put matrix in data field

Note: we are putting in sparse matrix here, lightgbm naturally handles sparse input

Use sparse matrix when your feature is sparse (e.g. when you are using one-hot encoding vector)

Alternatively, you can put in dense matrix, i.e. basic R-matrix

You can also put in lgb.Dataset object, which stores label, data and other meta datas needed for advanced features

Verbose = 0,1,2

You can also specify data as file path to a LibSVM/TCV/CSV format input

Since we do not have this file with us, the following line is just for illustration

bst <- lightgbm(

data = "agaricus.train.svm"

, num_leaves = 4L

, learning_rate = 1.0

, nrounds = 2L

, objective = "binary"

)

--------------------Basic prediction using lightgbm--------------

You can do prediction using the following line

You can put in Matrix, sparseMatrix, or lgb.Dataset

--------------------Save and load models-------------------------

Save model to binary local file

Load binary model to R

pred2 should be identical to pred

--------------------Advanced features ---------------------------

To use advanced features, we need to put data in lgb.Dataset

--------------------Using validation set-------------------------

valids is a list of lgb.Dataset, each of them is tagged with name

To train with valids, use lgb.train, which contains more advanced features

valids allows us to monitor the evaluation result on all data in the list

We can change evaluation metrics, or use multiple evaluation metrics

lgb.Dataset can also be saved using lgb.Dataset.save

To load it in, simply call lgb.Dataset

information can be extracted from lgb.Dataset using getinfo

We are dividing the dataset into two: one train, one validation

Remove 1 to label because it must be between 0 and 1

Data input to LightGBM must be a matrix, without the label

Creating the LightGBM dataset with categorical features

The categorical features can be passed to lgb.train to not copy and paste a lot

We can now train a model

Try to find split_feature: 11

If you find it, it means it used a categorical feature in the first tree

Remove 1 to label because it must be between 0 and 1

Data input to LightGBM must be a matrix, without the label

Creating the LightGBM dataset with categorical features

The categorical features must be indexed like in R (1-indexed, not 0-indexed)

label = bank$y,

We can now train a model

Try to find split_feature: 2

If you find it, it means it used a categorical feature in the first tree

Testing sample code with agaricus

Testing sample code with bank

Testing sample code with `agaricus`

Testing sample code with `bank`