thomasp85 / lime

Local Interpretable Model-Agnostic Explanations (R port of original Python package)
https://lime.data-imaginist.com/
Other
486 stars 110 forks source link

Errors in LIME #112

Closed ms1948 closed 6 years ago

ms1948 commented 6 years ago

library(MASS) library(lime) library(gdata) library(tidyverse) library(caret) rm(list=ls()) a=read.xls("HAL.xls",sheet=1) names(a) b=data.frame(a[1:26],row.names=1)

plot.new() i=0 i=i+1 IDName=a[i,1] index <- createDataPartition(b$BG, p = 0.7, list = FALSE) train_data <- b[index, ] test_data <- b[-index, ]

model_mlp <- caret::train(BG ~ ., data = train_data, method = "mlp", trControl = trainControl(method = "repeatedcv", number = 10, repeats = 5, verboseIter = FALSE))

spiega <- lime(train_data, model_mlp, bin_continuous = TRUE, n_bins = 5, n_permutations = 10)

pred <- data.frame(sample_id = 1:nrow(test_data), predict(model_mlp, test_data, type = "prob"), actual = test_data$BG) pred$prediction <- colnames(pred)[2:3][apply(pred[, 2:3], 1, which.max)] pred$correct <- ifelse(pred$actual == pred$prediction, "correct", "wrong") pred_cor <- filter(pred, correct == "correct") pred_wrong <- filter(pred, correct == "wrong")

test_data_cor <- test_data %>% mutate(sample_id = 1:nrow(test_data)) %>% filter(sample_id %in% pred_cor$sample_id) %>% sample_n(size = 1) %>% remove_rownames() %>% tibble::column_to_rownames(var = "sample_id") %>% select(-BG)

test_data_wrong <- test_data %>% mutate(sample_id = 1:nrow(test_data)) %>% filter(sample_id %in% pred_wrong$sample_id) %>% sample_n(size = 0) %>% remove_rownames() %>% tibble::column_to_rownames(var = "sample_id") %>% select(-BG)

explanation_cor <- lime::explain(test_data_cor, spiega,n_labels = 3, n_features = 5) explanation_wrong <- lime::explain(test_data_wrong, spiega, n_labels = 3, n_features = 5)

File HAL.xls

ID | ANB | ANSMe | ArGoMe | CoA | CoGn | CoGo | GoGn | INTERINC | L1FH | L1MP | Nme | NSAr | NSBa | OB | OJ | PPMP | PPSN | Sar | SN | SNA | SNB | SNPM | U1PP | Wits | BG S001 | -1.2 | 54.7 | 128.2 | 68.3 | 92.4 | 40.7 | 63.7 | 164.8 | 78.0 | 73.9 | 100.6 | 123.4 | 138.6 | 0.2 | -0.1 | 29.8 | 14.0 | 24.6 | 63.1 | 68.9 | 70.1 | 43.8 | 91.6 | -4.5 | Bad  S002 | -0.8 | 50.3 | 130.0 | 67.3 | 92.0 | 41.3 | 63.5 | 167.0 | 80.8 | 73.9 | 92.8 | 122.0 | 125.3 | 2.3 | -2.1 | 27.6 | 8.9 | 24.6 | 53.8 | 81.2 | 82.0 | 36.5 | 91.5 | -6.0 | Bad S003 | -3.2 | 56.7 | 134.2 | 71.1 | 104.2 | 41.8 | 74.8 | 137.1 | 70.1 | 80.4 | 101.8 | 119.6 | 128.4 | 0.0 | -5.6 | 33.1 | 7.8 | 24.1 | 58.1 | 79.9 | 83.1 | 40.9 | 109.4 | -12.7 | Good S004 | 0.2 | 54.7 | 140.5 | 68.3 | 90.4 | 38.3 | 64.5 | 130.8 | 59.8 | 85.0 | 93.4 | 122.9 | 133.4 | 1.7 | -0.7 | 33.7 | 7.0 | 25.4 | 58.2 | 79.5 | 79.4 | 40.7 | 110.5 | -1.6 | Bad ... ...

S137 | -5.7 | 59.5 | 137.7 | 67.1 | 105.2 | 47.5 | 72.7 | 138.7 | 74.0 | 75.2 | 107.6 | 119.3 | 125.0 | 1.0 | -5.0 | 27.4 | 16.0 | 22.6 | 60.6 | 76.2 | 81.9 | 43.4 | 118.7 | -13.8 | Bad S138 | -0.1 | 54.0 | 125.1 | 74.3 | 99.2 | 45.4 | 71.3 | 149.0 | 74.3 | 80.2 | 98.5 | 113.8 | 123.4 | 1.4 | -1.3 | 28.9 | 6.2 | 26.2 | 64.1 | 80.2 | 80.3 | 35.1 | 101.9 | -4.2 | Bad S139 | -0.1 | 58.1 | 125.0 | 85.0 | 113.7 | 54.9 | 78.3 | 129.6 | 75.6 | 85.9 | 104.9 | 120.3 | 130.2 | 2.6 | 0.0 | 21.2 | 6.8 | 28.0 | 66.7 | 86.1 | 86.3 | 28.0 | 123.4 | -5.7 | Bad S140 | 1.2 | 64.9 | 144.4 | 79.3 | 104.2 | 42.3 | 69.9 | 135.3 | 63.2 | 82.3 | 109.7 | 113.9 | 120.2 | 0.5 | 0.1 | 35.8 | 8.2 | 29.1 | 68.3 | 79.5 | 78.4 | 44.1 | 106.5 | -6.3 | Bad S141 | -2.8 | 62.8 | 118.9 | 77.6 | 110.1 | 55.3 | 76.8 | 151.4 | 72.4 | 86.5 | 109.5 | 120.2 | 122.1 | 6.6 | -2.8 | 23.7 | 2.1 | 32.8 | 65.5 | 81.1 | 83.9 | 25.8 | 98.3 | -7.4 | Good S142 | 1.8 | 61.7 | 123.3 | 77.7 | 103.9 | 47.6 | 72.9 | 128.5 | 60.6 | 91.2 | 107.0 | 120.6 | 131.1 | 0.3 | 2.6 | 27.2 | 9.5 | 27.4 | 65.6 | 78.7 | 76.9 | 36.6 | 113.1 | -3.4 | Bad S144 | 0.4 | 51.1 | 127.8 | 78.2 | 98.3 | 44.4 | 69.1 | 130.7 | 69.5 | 92.0 | 94.1 | 121.2 | 126.5 | 0.4 | 0.7 | 20.0 | 8.2 | 28.7 | 64.2 | 81.7 | 81.4 | 28.2 | 117.2 | -2.0 | Good S145 | 0.2 | 58.1 | 127.1 | 75.0 | 101.0 | 47.6 | 71.8 | 128.8 | 63.3 | 89.8 | 101.8 | 120.0 | 125.9 | 0.1 | -1.2 | 27.5 | 6.7 | 25.7 | 63.4 | 82.0 | 81.8 | 34.2 | 113.9 | -3.5 | Bad

ms1948 commented 6 years ago

From the above R instruction I receive the following error msg:

# @Error in glmnet(x[, c(features, j), drop = FALSE], y, weights = weights, : x should be a matrix with 2 or more columns

related to the instruction: explanation_cor <- lime::explain(test_data_cor, spiega,n_labels = 3, n_features = 5)

kimyenn commented 6 years ago

@ms1948 Did you ever solve this problem? I am seeing the same error

ms1948 commented 6 years ago

I am sorry, not yet. The person (Thomas Lin Pedersen thomasp85@gmail.com) responsible for the function development closed my issue without any answer back. Why don't you try to be in contact with the guy ? Have a nice day

P.S.: If you got the solution let me know

On Tue, Oct 16, 2018 at 7:13 AM kimyenn notifications@github.com wrote:

@ms1948 https://github.com/ms1948 Did you ever solve this problem? I am seeing the same error

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thomasp85/lime/issues/112#issuecomment-430103218, or mute the thread https://github.com/notifications/unsubscribe-auth/Ag3fQeiZqdTWA4o4IsHpigc8xK32qQhYks5ulWrfgaJpZM4U8pwW .

ms1948 commented 6 years ago

Hi, do you have had a contact with the developer in order to solve the error in lime::explain command?

Il mar 16 ott 2018, 07:13 kimyenn notifications@github.com ha scritto:

@ms1948 https://github.com/ms1948 Did you ever solve this problem? I am seeing the same error

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thomasp85/lime/issues/112#issuecomment-430103218, or mute the thread https://github.com/notifications/unsubscribe-auth/Ag3fQeiZqdTWA4o4IsHpigc8xK32qQhYks5ulWrfgaJpZM4U8pwW .

thomasp85 commented 6 years ago

@ms1948 it appears you closed it yourself, so I haven't looked into it...

That being said, I don't have a lot of time for lime at the moment. I'll try to carve out some time next month to fix issues...

thomasp85 commented 6 years ago

@ms1948 and @kimyenn , can either of you provide a reprex to help me identify this problem?

ms1948 commented 6 years ago

Thomas,

this is the error msg: " # @error https://github.com/error in glmnet(x[, c(features, j), drop = FALSE], y, weights = weights, : x should be a matrix with 2 or more columns related to the R command explanation_cor <- lime::explain(test_data_cor, spiega,n_labels = 3, n_features = 5).

If you run my example (with the data file) supplied to you in 29 Jul you can see the error msg.

Regards

On Tue, Oct 30, 2018 at 2:21 PM Thomas Lin Pedersen < notifications@github.com> wrote:

@ms1948 https://github.com/ms1948 and @kimyenn https://github.com/kimyenn , can either of you provide a reprex to help me identify this problem?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thomasp85/lime/issues/112#issuecomment-434296916, or mute the thread https://github.com/notifications/unsubscribe-auth/Ag3fQe58U-zCb-8OTRY6_wYG8VnZihvTks5uqFI9gaJpZM4U8pwW .

thomasp85 commented 6 years ago

I would like to not have to parse a plaintext version of an excel file - can you either provide a download link or replicate the error with a build-in dataset

ms1948 commented 6 years ago

In attach the R commands and the Excel file.

On Tue, Oct 30, 2018 at 3:13 PM Thomas Lin Pedersen < notifications@github.com> wrote:

I would like to not have to parse a plaintext version of an excel file - can you either provide a download link or replicate the error with a build-in dataset

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thomasp85/lime/issues/112#issuecomment-434317022, or mute the thread https://github.com/notifications/unsubscribe-auth/Ag3fQdvxicunZhBld6K29Lr3RIIH9u8Sks5uqF5-gaJpZM4U8pwW .

library(MASS) library(lime) library(gdata) library(tidyverse) library(caret) rm(list=ls()) a=read.xls("HAL.xls",sheet=1) names(a) b=data.frame(a[1:24],row.names=1) class(b) str(b) plot.new() i=0 i=i+1 IDName=a[i,1] index <- createDataPartition(b$BGA, p = 0.7, list = FALSE) train_data <- b[index, ] test_data <- b[-index, ]

model_mlp <- caret::train(BGA ~ ., data = train_data, method = "mlp", trControl = trainControl(method = "repeatedcv", number = 10, repeats = 5, verboseIter = FALSE))

spiega <- lime(train_data, model_mlp, bin_continuous = TRUE, n_bins = 5, n_permutations = 10)

pred <- data.frame(sample_id = 1:nrow(test_data), predict(model_mlp, test_data, type = "prob"), actual = test_data$BGA) pred$prediction <- colnames(pred)[2:3][apply(pred[, 2:3], 1, which.max)] pred$correct <- ifelse(pred$actual == pred$prediction, "correct", "wrong") pred_cor <- filter(pred, correct == "correct") pred_wrong <- filter(pred, correct == "wrong")

test_data_cor <- test_data %>% mutate(sample_id = 1:nrow(test_data)) %>% filter(sample_id %in% pred_cor$sample_id) %>% sample_n(size = 1) %>% remove_rownames() %>% tibble::column_to_rownames(var = "sample_id") %>% select(-BGA)

test_data_wrong <- test_data %>% mutate(sample_id = 1:nrow(test_data)) %>% filter(sample_id %in% pred_wrong$sample_id) %>% sample_n(size = 0) %>% remove_rownames() %>% tibble::column_to_rownames(var = "sample_id") %>% select(-BGA)

explanation_cor <- lime::explain(test_data_cor, spiega,n_labels = 3, n_features = 5) explanation_wrong <- lime::explain(test_data_wrong, spiega, n_labels = 3, n_features = 5)

thomasp85 commented 6 years ago

I still don't have the excel file - unless you provide it I will not be able to reproduce your issue

ms1948 commented 6 years ago

I really don't uderstand your question, in my e-mail dated October 30 I have attached two files, the R commands and the excel. If there are problems to download the files, please give me another e-mail addr so I can supply the command and the data.

Thanks

On Mon, Nov 12, 2018 at 11:49 AM Thomas Lin Pedersen < notifications@github.com> wrote:

I still don't have the excel file - unless you provide it I will not be able to reproduce your issue

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thomasp85/lime/issues/112#issuecomment-437836339, or mute the thread https://github.com/notifications/unsubscribe-auth/Ag3fQTMSoQtE21Qj6QYX9WskXAMWI8RJks5uuVJNgaJpZM4U8pwW .

thomasp85 commented 6 years ago

Email attachments are removed when replying to github mails. You can send it to me by using the email address given in the DESCRIPTION file

ms1948 commented 6 years ago

Files mailed to:

Thomas Lin Pedersen thomasp85@gmail.com

On Mon, Nov 12, 2018 at 1:14 PM Thomas Lin Pedersen < notifications@github.com> wrote:

Email attachments are removed when replying to github mails. You can send it to me by using the email address given in the DESCRIPTION file

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thomasp85/lime/issues/112#issuecomment-437857709, or mute the thread https://github.com/notifications/unsubscribe-auth/Ag3fQQmYQcdsfNcvvXqp_0YaDG6O5msoks5uuWY6gaJpZM4U8pwW .

thomasp85 commented 6 years ago

Ok, so it seems that the culprit is a mix of your model being bad, and lime not anticipating that type of badness... Basically it seems like your model is only able to produce a single output, which becomes clear if you look at predict(model_mlp, test_data, type = "prob")... This means that lime cannot fit any model to the response as it is a constant...

I'll try to catch this in the future, but lime will never work with this model (and you shouldn't trust it anyway as it will always predict "Bad" with 84% probability)