thomasp85 / lime

Local Interpretable Model-Agnostic Explanations (R port of original Python package)
https://lime.data-imaginist.com/
Other
481 stars 109 forks source link

Permutations when large data frame #153

Closed muschellij2 closed 5 years ago

muschellij2 commented 5 years ago

In https://github.com/thomasp85/lime/blob/ca363ec02b511cd7c67bfd99533f3ec3e3538ce7/R/permute_cases.R#L11, if we have over 858993 records, as 858993 * 5000 ≈ 2^32, then sample.int throws an error. It may be even lower than that. Maybe indicate that you can't have as large of a data set.

library(lime)
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
n = 600000
x = rnorm(n)
y = (x^2 + 2) > 4

df = data.frame(y = factor(y*1), x =x)
train_df = df[ 1:1000, ]
model = train(y ~ x, data = train_df, method = "knn")
explainer = lime(x = df, model)
xx = explain(df, explainer, n_labels = 2, n_features = 5)
#> Warning in sample.int(length(x), size, replace, prob): NAs introduced by
#> coercion to integer range
#> Error in sample.int(length(x), size, replace, prob): invalid 'size' argument
library(lime)
library(caret)
n = 20000
x = rnorm(n)
y = (x^2 + 2) > 4

df = data.frame(y = factor(y*1), x =x)
train_df = df[ 1:1000, ]
model = train(y ~ x, data = train_df, method = "knn")
explainer = lime(x = df, model)
xx = explain(df, explainer, n_labels = 2, n_features = 5)

Created on 2019-04-15 by the reprex package (v0.2.1)

thomasp85 commented 5 years ago

yes... even if you convinced R to work with this amount of data it would take forever as you'd have to train north of a million models... If you insist on creating explanations for all your observations then run it in chunks, possibly parallelising it