sergioburdisso / pyss3

A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI :octocat:)
https://pyss3.readthedocs.io
MIT License
333 stars 44 forks source link

Use evaluation and explanation as a standalone package? #14

Closed ydennisy closed 3 years ago

ydennisy commented 3 years ago

Hey, cool project!

Just wondering if you think it is feasible to use the explanations with other models?

sergioburdisso commented 3 years ago

Hey @ydennisy! Thanks :D

I think it could be possible to use explanations with other models, as long as the Live Test tool receives a JSON holding the value each word/element has, it will work regarding the model being used. In principle, updating the server module's code (server.py) so that a JSON for the classification result could be generated by other models will do the trick. However, since most supervised machine learning models work as black boxes, representing input documents as feature vectors, we will need to use some method/technique on top of it (such as LIME) to try to infer/estimate how valuable each raw input element was according to the model being used. This was straightforward for the SS3 model since, by design, the model explicitly learns a confidence value for each input element and each class, and then, the classification is performed directly based on those values. This confidence value tells how relevant each input element is as if we were asking "How much value has this element for this class?". e.g. after training the model, SS3 would learn something like:

value("apple", technology) = 0.7
value("apple", business) = 0.5
value("apple", food) = 0.9

and

value("the", technology) = value("the", business) = value("the", food) = 0

That is, "apple" has a value of 0.7 for technology, 0.5 for business, and 0.9 for food, whereas "the" has no value for any category. Note that even Multinomial Naive Bayes (a white box model) wouldn't work "out of the box" either since it values/weights each element by its probability (log P(w|c)) and hence (stop)words like "the", "a", "with", etc. would have the highest value (i.e. highest probability).

Regarding Evaluation, I think it is also feasible, at least for models with three hyperparameters (or less); with more than three hyperparameters we will need to think about how to adapt the Evaluation 3D Plot UI for letting users select only the three (or less) hyperparameters they are interested in.

sergioburdisso commented 3 years ago

(I'm closing this issue, but feel free to reopen it whenever you want, I'll also reopen in case sometime in the future I'm able to actually implement this. Anyways, thanks for raising the question 💪🤓👍Take care!)