Open aysenurbilgin opened 5 years ago
Checked in notebook:
genre_labels[skp.predict([article.raw_text])[0]]
'Verslag'
-
sorted(zip(genre_labels, skp.predict_proba([article.raw_text])[0]), key=lambda x: -x[1])
[('Verslag', 0.73057395),
('Afbeelding', 0.046490025),
('Mededeling', 0.043383565),
('Brief', 0.04182354),
('Portret', 0.026247267),
('Nieuwsbericht', 0.023750637),
('Overzicht', 0.020018548),
('Fictie', 0.018087009),
('Opiniestuk', 0.009474231),
('Interview', 0.008421486),
('Recensie', 0.007908742),
('Essay', 0.0057582003),
('Column', 0.005404688),
('Reportage/feature', 0.0050709583),
('Achtergrond', 0.004480537),
('Service', 0.0031066334)]
https://gist.github.com/Tommos0/47a65b627bfe2bcf95228e0c0ef538a9#file-issue29-ipynb (final plot doesn't show, but it's the same as on the platform).
What's happening is that LIME's prediction probability doesn't align with a simple predict_proba call (at all).
See particular example on demo environment with UGS test ACE: Explanation for Article 60 using BGS XGB Frog+TFIDF BGS XGB Frog+TFIDF predicts Verslag. Actual genre is Nieuwsbericht. Under prediction probabilities, essay is 0.47.