marcotcr / lime

Lime: Explaining the predictions of any machine learning classifier
BSD 2-Clause "Simplified" License
11.5k stars 1.79k forks source link

Handling one hot encoded features and normalisation #559

Closed Lion-Mod closed 3 years ago

Lion-Mod commented 3 years ago

I'm wanting to use LIME with a gradient boosting classifier. I've got it running and working however let's say fruit gets split into fruit_banana and fruit_lime when it comes to explaining an example the explainer will display e.g. fruit_banana = 1 and fruit_lime = 0 in the same plot. Ideally don't want this.

Curious if this shouldn't be represented like this? Looking at shap it appears that this isn't the way to do it.

Some what similarly, the continuous features have normalised values (not ideal for interpreting).

I understand that I could omit the continuous output and simple put the feature name next to it and display the bar in another plot but would like to know of a way to handle this if possible.

Summarising

  1. Should OHE features be represented in this fashion?
  2. If not how to handle reversing the OHE with LIME?
  3. How to denormalise the outputted values so the plot is more interpretable?

Any help would be greatly appreciated! 👍

God-Hades commented 2 years ago

Hey, I had similar concerns, how did you end up fixing it, thanks!