Preserve order of text for explain_instance

marcotcr / lime

Lime: Explaining the predictions of any machine learning classifier

BSD 2-Clause "Simplified" License

11.54k stars 1.8k forks source link

Given the sentence:

The cat is a bad cat.

The current explain_instance returns:

[('bad', 0.023544989987054128), ('The', -2.223453279269586e-06), ('cat', -2.0328098267135788e-06), ('a', -1.29583902574453e-06), ('cat', -1.2776487837649124e-06), ('is', -1.1776258015649435e-06)]

The order makes it impossible to determine which cat is which.

Ideally the method would return in the input order:

[('The', -2.223453279269586e-06), ('cat', -2.0328098267135788e-06), ('is', -1.1776258015649435e-06), ('a', -1.29583902574453e-06), ('bad', 0.023544989987054128), ('cat', -1.2776487837649124e-06)]

I patched a fork for our needs but think this might be a useful option for others?

Hello Paul, I got the same problem with multilabel classification:

This is my classes : ['Airports', 'Artists', 'Astronauts', 'Astronomical_objects', 'Building', 'City', 'Comics_characters', 'Companies', 'Foods', 'Monuments_and_memorials', 'Politicians', 'Sports_teams', 'Sportspeople', 'Transport' 'Universities_and_colleges', 'Written_communication']
When I call : explainer = LimeTextExplainer(class_names=class_names) exp = explainer.explain_instance(X_test.iloc[1], grid.predict_proba, num_features=10) fig = exp.as_pyplot_figure()
The explainer predicts Artists but my estimator predicts 'City for my input X_test.iloc[1]
I got the same problem many times with the prediction.

marcotcr / lime

Preserve order of text for explain_instance #667