ing-bank / probatus

Validation (like Recursive Feature Elimination for SHAP) of (multiclass) classifiers & regressors and data used to develop them.
https://ing-bank.github.io/probatus
MIT License
132 stars 40 forks source link

Masking based on columns not rows #176

Closed gverbock closed 2 years ago

gverbock commented 2 years ago

When looking at the utils/shap_helpers.py I see that a mask is defined. However, if I am right, the mask is defined on the number col columns while it should be the number of row.

image

I would propose the followoing change

X.shape[1]

Would become

X.shape[0]

Also:

There is a typo in model_interpret.py on row 99.

image

Should be neither

timvink commented 2 years ago

There is already a default sample size defined:

https://github.com/ing-bank/probatus/blob/e164c6b42bba369d499f53eb1e4b9ceb51c7a209/probatus/utils/shap_helpers.py#L36

If the number of columns is lower than the sample_size , it will take 20% of the number of cols (that's how I read it).

For the typo.. open a PR ! :)

gverbock commented 2 years ago

@timvink That's true but I believe it should take 20% of the number of rows and not columns.

gverbock commented 2 years ago

@Matgrb @timvink I saw I messed-up the pipeline (sorry about that). What is the best option? Revert the previous merge and re-open this issue or create a new issue?

Matgrb commented 2 years ago

No worries Gilles! If you would like to fix the pipeline and test the fix, you can make a PR that implements it and see if tests succeed :)