thomasp85 / lime

Local Interpretable Model-Agnostic Explanations (R port of original Python package)
https://lime.data-imaginist.com/
Other
486 stars 110 forks source link

Error in cut.default(x[[i]], unique(explainer$bin_cuts[[i]]), labels = FALSE, : invalid number of intervals #186

Open marianeira opened 3 years ago

marianeira commented 3 years ago

It seems that lime explanation does not work with variables with just NAs and constant value, which do fit the XGBOOST.

For instance, I have a variable that is highly correlated to the target, in fact, it is the variable with the highest gain within the importance of variables. Besides, if we replace missing values with an extreme value we obtain a correlation with the target of 0.77.

However, it does not work within LIME explanation because its deviation is zero (it does not consider missing values, unlike xgboost). Therefore I can't use the lime benefits with these types of variables. Is there any other solution rather than removing that type of columns, which seems to work well in XGBOOST?

Here, there is a simple example of the problem. Thanks in advance

df <- data.frame(target = c(0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2), var1 = rnorm(22), var2 = rnorm(22)*10, var3 = c(rep(0,20),1,1), var4 = c(-1,-2,5,3,1,2,2,1,1,2,1,-1,5,1,1,20,2,1,0,2,2,2), var5 = c(NA,NA,NA,NA,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))

Train Xgboost

X_train <- df %>% select(-target)

dtrain <- xgb.DMatrix(data.matrix(X_train), label = as.matrix(df$target))

boost <- xgb.train(data = dtrain, list(max_depth = 7, eta = 0.1, objective = "multi:softprob", eval_metric = "error", nthread = 1), num_class = 3, nrounds = 100) xgb.importance(feature_names = colnames(dtrain), model = boost)

local_obs <- X_train[c(1,2),]

Fit Lime, quantile bins = FALSE

explainer1 <- lime(x=X_train,model=boost, quantile_bins = F) Error in cut.default(x[[i]], unique(explainer$bin_cuts[[i]]), labels = FALSE, : invalid number of intervals explanations1 <- lime::explain(local_obs, explainer1, n_labels = 2, n_features = 2) plot_explanations(explanations1)

Fit Lime, quantile bins = TRUE

explainer2 <- lime(x=X_train,model=boost, quantile_bins = T) Error in cut.default(x[[i]], unique(explainer$bin_cuts[[i]]), labels = FALSE, : invalid number of intervals In addition: Warning messages: 1: var3 does not contain enough variance to use quantile binning. Using standard binning instead. 2: var5 does not contain enough variance to use quantile binning. Using standard binning instead. explanations2 <- lime::explain(local_obs, explainer2, n_labels = 2, n_features = 2) plot_explanations(explanations2)