Open alessandrobenedetti opened 6 years ago
Hey! I personally think directly supporting the Solr Boosted Tree JSON format would be easier than trying to match the XGBoost dump format. A recent PR (#116) added support for loading the JSON format of LightGBM models, so this would be very similar.
The C extension that is part of shap takes a set of parallel arrays describing each tree. These follow the same format as the sklearn tree models. So the main task would be to convert the Solr json representation into the parallel array representation for each tree (which would happen in the Tree constructor in explainers/tree.py).
If you want to give it a shot, I am happy to answer questions along the way.
Hi all, I am a newbie in Shap, so sorry to disturb! I am exploring a way to parse a Xgboost model dump to memory to then be able to use it with TreeSHAP. This is my first approach in explaining the Solr Json format for the boosted tree model (basically converting the Solr format to Xgboost dump and then loading the dump and using plain TreeSHAP). it is to keep it simple and basically just use what is already there.
In case it is not going to work, I would happy to contribute the Solr Boosted Tree format to the list of models supported by TreeSHAP. That would require some questions, explanations and discussion with the community to agree on the best approach. For that case, how do you suggest me to engage? Directly here?
This is an example of how Apache Solr expects a boosted trees model to be encoded in Json :
https://lucene.apache.org/solr/7_3_0//solr-ltr/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.html