Dear lime contributors,
thanks for your awesome work on this repository.
Alas, I got an error that took me several days to figure out, and is reproducible:
explanation.lime <- lime::explain(
x = local.obs,
explainer = explainer.lime,
n_features = 5
)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 30000, 0
Fortunately I reached a point that I not only could narrow down the location of the source code but also the conditions that trigger it - but not completely, so I hope you figure out the last mile.
The condition that triggers it is a column in the cases argument of permute_cases that has zero variance and is integer, in my case it is column reviews.numHelpful
This is wrong! This result should come from the only factor, i.e. the first column and thus rendered by feature_distribution[[2]]!
Consequently, the next line diff(bin_cuts[[2]])[bin] always returns NULL which leads to an empty return value integer(0)
So far, I could narrow the root cause to this point - but I am clueless what diff(bin_cuts[[2]])[bin] means and how this can be prevented.
Update
I found a potential reason for this apparent index problem.
The feature distribution includes the target variable .outcome as first list item, and thus all indeces are wrong by offset 1:
Dear lime contributors, thanks for your awesome work on this repository. Alas, I got an error that took me several days to figure out, and is reproducible:
Fortunately I reached a point that I not only could narrow down the location of the source code but also the conditions that trigger it - but not completely, so I hope you figure out the last mile.
The condition that triggers it is a column in the
cases
argument ofpermute_cases
that has zero variance and is integer, in my case it is columnreviews.numHelpful
This column leads to an empty output within the
permute_cases.data.frame function
in the lines identfying the "bin" ifelse statement:which can be seen here:
I disentangled the type conversion to dataframe and thus found that this throws the above error:
perms <- as.data.frame(perms, stringsAsFactors = FALSE)
The feature_distribution[[2]] gives:
This is wrong! This result should come from the only factor, i.e. the first column and thus rendered by
feature_distribution[[2]]
! Consequently, the next linediff(bin_cuts[[2]])[bin]
always returns NULL which leads to an empty return valueinteger(0)
So far, I could narrow the root cause to this point - but I am clueless what
diff(bin_cuts[[2]])[bin]
means and how this can be prevented.Update
I found a potential reason for this apparent index problem. The feature distribution includes the target variable
.outcome
as first list item, and thus all indeces are wrong by offset 1:However, the target variable is inevitable because the documentation for
?lime
specifies:x The training data used for training the model that should be explained.
So the training data (including the target), not the features (excluding the target) must be fed into
lime::lime()
. Now I wonder:_Is this a problem in
lime::lime()
orpermutate_cases()
??_Can you fix this?? Tricky...