Closed josh-freeman closed 1 year ago
Hi @josh-freeman, thanks for your interest!
Great question, sometimes you can use hooks and define those on the architecture without having to add code to save the attention maps and gradients, see this documentation this would require the architecture to have some layer that outputs the attention weights, which is not the case in all architectures.
Is it possible to make it so that
attn_probs
and the other attrinbutes necessary for explainability are present on residual attention blocks of a model coming from here or here?Cheerio