FullGrad for Vision Transformers

Hi! Sorry for the late reply. So technically Fullgrad is proposed for convolution or fully connected neural networks, and as a result the completeness conditions may not be satisfied for transformers.

However you are free to use Simple / Smooth FullGrad in this case, which don't have completeness associated. I haven't tested it for transformers myself, but you'd need to change this line: https://github.com/idiap/fullgrad-saliency/blob/2121d212494d8dc401e27ec8198551efe68dd58f/saliency/tensor_extractor.py#L25 to include self-attention layers and exclude fully connected maybe.

If you do happen to use it, I'd be happy to learn about your experience or the issues you faced!

idiap / fullgrad-saliency

FullGrad for Vision Transformers #14