newsgac / platform

Platform for machine learning experiments developed in the project NEWSGAC
https://research-software.nl/projects/newsgac
Apache License 2.0
5 stars 1 forks source link

Prediction probabilities not matching with the genre predicted on LIME view #29

Open aysenurbilgin opened 5 years ago

aysenurbilgin commented 5 years ago

See particular example on demo environment with UGS test ACE: Explanation for Article 60 using BGS XGB Frog+TFIDF BGS XGB Frog+TFIDF predicts Verslag. Actual genre is Nieuwsbericht. Under prediction probabilities, essay is 0.47.

Tommos0 commented 5 years ago

Checked in notebook:

genre_labels[skp.predict([article.raw_text])[0]]
'Verslag'
-
sorted(zip(genre_labels, skp.predict_proba([article.raw_text])[0]), key=lambda x: -x[1])
[('Verslag', 0.73057395),
 ('Afbeelding', 0.046490025),
 ('Mededeling', 0.043383565),
 ('Brief', 0.04182354),
 ('Portret', 0.026247267),
 ('Nieuwsbericht', 0.023750637),
 ('Overzicht', 0.020018548),
 ('Fictie', 0.018087009),
 ('Opiniestuk', 0.009474231),
 ('Interview', 0.008421486),
 ('Recensie', 0.007908742),
 ('Essay', 0.0057582003),
 ('Column', 0.005404688),
 ('Reportage/feature', 0.0050709583),
 ('Achtergrond', 0.004480537),
 ('Service', 0.0031066334)]
Tommos0 commented 5 years ago

https://gist.github.com/Tommos0/47a65b627bfe2bcf95228e0c0ef538a9#file-issue29-ipynb (final plot doesn't show, but it's the same as on the platform).

What's happening is that LIME's prediction probability doesn't align with a simple predict_proba call (at all).