iancovert / fastshap

An amortized approach for calculating local Shapley value explanations
MIT License
86 stars 17 forks source link

Multiclass classification fast shap #6

Closed valevalerio closed 12 months ago

valevalerio commented 1 year ago

Hello, thanks for sharing the runnable code from your paper, looking forward to include fastshap among my fav explainers!

I would like to know if there is a stright forward adaptation of FS to a multiclass classification (i.e. more than 2 classes). I have tried to change directly the number of inputs neurons and output neurons for the surrogate model. From the tutorial, the surrogate model has 2 * num_features inputs and 2 outputs. I changed it in order to handle n_classes * num_features. And similarly in in the explainer model, i changed it to have n_classes * num_features outputs rather than just 2 * num_features.

Maybe I am oversimplifying, but nontheless seems a smooth change, in the execution fails. Am I overlooking for a solution?

iancovert commented 1 year ago

Hi Valerio, thanks for checking out the repo!

FastSHAP is easy to adapt to models with >2 classes. If you have $k$ classes then the original model and surrogate will both have $k$ outputs. On the input side, if you have $d$ features then the original model will have $d$ inputs, and the surrogate will have $2d$ (because we pass a binary indicator of which features are missing). Finally, the explainer will have $d$ inputs and $dk$ outputs, with one Shapley value per feature for each class.

The difference with what you've described seems to be with the surrogate: the surrogate should have $2d$ inputs and $k$ outputs.

Let me know if this makes sense, hopefully the modified code works now!

valevalerio commented 12 months ago

Thank you very much for the tempestivirty of your reply. The indicators of missing features in the surrogate model were indeed the catch. It worked as a C H A R M. Thanks