Open hal-314 opened 3 years ago
Thanks @hal-314 for the good suggestion!
Captum provides NoiseTunnel as a generic mechanism to achieve the desired behavior.
The NoiseTunnel.attribute() method takes n_samples
as a parameter.
Hope this helps
Thanks for the tip @bilalsal! I knew about NoiseTunnel
but I didn't realize that it could be use to achieve this!
From user perspective, using NoiseTunnel
with FeaturePermutation
or any other perturbation algorithm to achieve n_samples
behavior may not be obvious when ShapleyValueSampling
has n_samples
. It would be nice to put a notice in the docs. I discarded NoiseTunnel
as seems to be focus on adding noise to samples.
I leave this issue open as I think that current situation isn't user friendly. Feel free to close this issue.
Hi @hal-314 , just wanted to get more context on this issue. For FeatureAblation and Occlusion, the perturbations don't involve any randomness and should be the same when repeating attribution for the same input and baselines. As long as the model forward pass is deterministic, we should expect multiple samples to result in the same attribution results. What was the use-case that you had in mind for n_samples in these methods?
For FeaturePermutation, I think this definitely makes sense and is something we could look into. As @bilalsal mentioned, NoiseTunnel could potentially serve as a workaround. A couple of things to keep in mind with this approach, NoiseTunnel expands the input internally and would essentially provide FeaturePermutation with the expanded original batch repeated n_samples times. This could constrain n_samples by memory limitations to fit the full expanded batch in memory until this issue #497 is addressed. Also, if FeaturePermutation is used with a forward function that returns a scalar value per batch (e.g. loss), the result would now correspond to applying a single permutation to the expanded batch, rather than averaging over multiple independent permutations on the original batch. In many cases, these two approaches will be similar, but there are cases where these could be different.
@vivekmig Rereading again FeatureAblation and Occlusion, I think you are right and n_samples argument doesn't make sense. I mainly used FeaturePermutation. As FeatureAblation and Occlusion were part of the same family (perturbation algorithms), I jumped to the conclusions and assume that they'd benefit from n_samples. Sorry for the misunderstanding.
For FeaturePermutation, I think this definitely makes sense and is something we could look into.
That's nice!
EDIT: tag the right person, sorry vishwakftw!
@hal-314, you seem to have tagged the wrong person. :-)
@hal-314, has this issue been already solved ? If so, can we, please, close this issue? Thank you :)
🚀 Feature
~Add
n_samples
argument for all perturbation based methods like it exists in ShapleyValueSampling.~ Addn_samples
argument to FeaturePermutation like it exists in ShapleyValueSampling.Motivation
~Perturbation based algorithms~ FeaturePermutation computes feature attribution by perturbing input features. So, depending in how is perturbed, feature attribution varies. To make a more robust estimation, perturbations should happen several times. For example, scikit-learn feature permutation function repeats permutation 5 times by default.
Finally, it'd be consistent with ShapleyValueSampling.
Pitch
~Implement
n_samples
argument for other perturbation algorithm in addition than ShapleyValueSampling.~ Implementn_samples
argument FeaturePermutation.Alternatives
Do it manually by subclassing every algorithm (
FeaturePermutation
, ~FeatureAblation
andOcclusion
~) and overridingattribute
method. Then, the new attribution method would call several times the base implementations and average the results.Additional context
EDIT: change
n_samples
feature request to only FeaturePermutation as it doesn't make sense in FeatureAblation and Occlusion algorithms.