Add n_samples argument to FeaturePermutation like it exists in ShapleyValueSampling

hal-314 commented 3 years ago

🚀 Feature

~Add n_samples argument for all perturbation based methods like it exists in ShapleyValueSampling.~ Add n_samples argument to FeaturePermutation like it exists in ShapleyValueSampling.

Motivation

~Perturbation based algorithms~ FeaturePermutation computes feature attribution by perturbing input features. So, depending in how is perturbed, feature attribution varies. To make a more robust estimation, perturbations should happen several times. For example, scikit-learn feature permutation function repeats permutation 5 times by default.

Finally, it'd be consistent with ShapleyValueSampling.

Pitch

~Implement n_samples argument for other perturbation algorithm in addition than ShapleyValueSampling.~ Implement n_samples argument FeaturePermutation.

Alternatives

Do it manually by subclassing every algorithm (FeaturePermutation, ~FeatureAblation and Occlusion~) and overriding attribute method. Then, the new attribution method would call several times the base implementations and average the results.

Additional context

EDIT: change n_samples feature request to only FeaturePermutation as it doesn't make sense in FeatureAblation and Occlusion algorithms.

bilalsal commented 3 years ago

Thanks @hal-314 for the good suggestion!

Captum provides NoiseTunnel as a generic mechanism to achieve the desired behavior. The NoiseTunnel.attribute() method takes n_samples as a parameter.

Hope this helps

hal-314 commented 3 years ago

Thanks for the tip @bilalsal! I knew about NoiseTunnel but I didn't realize that it could be use to achieve this!

From user perspective, using NoiseTunnel with FeaturePermutation or any other perturbation algorithm to achieve n_samples behavior may not be obvious when ShapleyValueSampling has n_samples. It would be nice to put a notice in the docs. I discarded NoiseTunnel as seems to be focus on adding noise to samples.

I leave this issue open as I think that current situation isn't user friendly. Feel free to close this issue.

vivekmig commented 3 years ago

Hi @hal-314 , just wanted to get more context on this issue. For FeatureAblation and Occlusion, the perturbations don't involve any randomness and should be the same when repeating attribution for the same input and baselines. As long as the model forward pass is deterministic, we should expect multiple samples to result in the same attribution results. What was the use-case that you had in mind for n_samples in these methods?

For FeaturePermutation, I think this definitely makes sense and is something we could look into. As @bilalsal mentioned, NoiseTunnel could potentially serve as a workaround. A couple of things to keep in mind with this approach, NoiseTunnel expands the input internally and would essentially provide FeaturePermutation with the expanded original batch repeated n_samples times. This could constrain n_samples by memory limitations to fit the full expanded batch in memory until this issue #497 is addressed. Also, if FeaturePermutation is used with a forward function that returns a scalar value per batch (e.g. loss), the result would now correspond to applying a single permutation to the expanded batch, rather than averaging over multiple independent permutations on the original batch. In many cases, these two approaches will be similar, but there are cases where these could be different.

hal-314 commented 3 years ago

@vivekmig Rereading again FeatureAblation and Occlusion, I think you are right and n_samples argument doesn't make sense. I mainly used FeaturePermutation. As FeatureAblation and Occlusion were part of the same family (perturbation algorithms), I jumped to the conclusions and assume that they'd benefit from n_samples. Sorry for the misunderstanding.

For FeaturePermutation, I think this definitely makes sense and is something we could look into.

That's nice!

EDIT: tag the right person, sorry vishwakftw!

vishwakftw commented 3 years ago

@hal-314, you seem to have tagged the wrong person. :-)

NarineK commented 3 years ago

@hal-314, has this issue been already solved ? If so, can we, please, close this issue? Thank you :)

hal-314 commented 3 years ago

@NarineK FeaturePermutationstill doesn't haven_samples` like ShapleyValueSampling. See the api. Feel free to close the issue if you think that isn't important.

pytorch / captum