marcotcr / lime

Lime: Explaining the predictions of any machine learning classifier
BSD 2-Clause "Simplified" License
11.54k stars 1.8k forks source link

reliability of Lime continuous variable discretization with poor exp score #666

Open SSMK-wq opened 2 years ago

SSMK-wq commented 2 years ago

I am using a random forest classifier for binary classification with 977 records and class proportion of 77:23.

I am using Lime explainer to explain the predictions made by the model.

However, I see that my Lime exp score is only 20-40 for 80pc of my observations.

But I like the idea that Lime discretizes continuous variable into bins for model explanations. Ex: Age is divided into bins 3 bins. <30, >30 and <=78 and >78.

So, positive and negative classes have different bins. Meaning, positive class has only two bins (bin 1 and 2) and negative class has only bin 3.

So, instead of relying on LIME feature coefficients (which may not be reliable due to poor explanation score), I plan to just compute the number of times under each class, a specific bin appears and use that to plot bars. So, I take the advantage of lime discretization (for each class) but use my method to show importance of a feature.

But, Do you think poor lime exp score indicates poorly computed bins for continuous variable? Can I rely on lime computed bins of continuous variable? (Even though my exp score is only 20-40). I really like that Lime computed different bin ranges for each classes. This is such an interesting insight.