shap / shap

A game theoretic approach to explain the output of any machine learning model.
https://shap.readthedocs.io
MIT License
22.62k stars 3.26k forks source link

TreeSHAP for Apache Solr boosted trees json model #127

Open alessandrobenedetti opened 6 years ago

alessandrobenedetti commented 6 years ago

Hi all, I am a newbie in Shap, so sorry to disturb! I am exploring a way to parse a Xgboost model dump to memory to then be able to use it with TreeSHAP. This is my first approach in explaining the Solr Json format for the boosted tree model (basically converting the Solr format to Xgboost dump and then loading the dump and using plain TreeSHAP). it is to keep it simple and basically just use what is already there.

In case it is not going to work, I would happy to contribute the Solr Boosted Tree format to the list of models supported by TreeSHAP. That would require some questions, explanations and discussion with the community to agree on the best approach. For that case, how do you suggest me to engage? Directly here?

This is an example of how Apache Solr expects a boosted trees model to be encoded in Json :

https://lucene.apache.org/solr/7_3_0//solr-ltr/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.html

slundberg commented 6 years ago

Hey! I personally think directly supporting the Solr Boosted Tree JSON format would be easier than trying to match the XGBoost dump format. A recent PR (#116) added support for loading the JSON format of LightGBM models, so this would be very similar.

The C extension that is part of shap takes a set of parallel arrays describing each tree. These follow the same format as the sklearn tree models. So the main task would be to convert the Solr json representation into the parallel array representation for each tree (which would happen in the Tree constructor in explainers/tree.py).

If you want to give it a shot, I am happy to answer questions along the way.