Closed afcarvallo closed 2 years ago
Hi @afcarvallo, could you elaborate, please. Not sure I understand what you mean.
Hi, @jalammar thanks for answering. I mean, if there is an option to choose a desired head to explore the attentions since transformer-based models can one or more heads (most of the time 12). Or the attentions showed by echo consider the average of all heads?
Do you mean layers? GPT2, for example, would have 12 layers, but only one "language model head". Ecco captures the neuron activations of all the layers.
ahh ok , I refer to heads , for example BERT LM can have multiple attention heads.
Ah, attention heads! Now I understand, sorry. Ecco doesn't really do much with attention yet. Hugging face transformers would retrieve the attention values for you in the model output if you request it.
Hi @jalammar, I tested some examples with Ecco, and I wanted to know if it is possible to change the head to view the activations for each one and for each layer?