attention head - Githubissues

jalammar / ecco

Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).

https://ecco.readthedocs.io

BSD 3-Clause "New" or "Revised" License

1.97k stars 167 forks source link

attention head #37

Closed afcarvallo closed 2 years ago

afcarvallo commented 3 years ago

Hi @jalammar, I tested some examples with Ecco, and I wanted to know if it is possible to change the head to view the activations for each one and for each layer?

jalammar commented 3 years ago

Hi @afcarvallo, could you elaborate, please. Not sure I understand what you mean.

afcarvallo commented 3 years ago

Hi, @jalammar thanks for answering. I mean, if there is an option to choose a desired head to explore the attentions since transformer-based models can one or more heads (most of the time 12). Or the attentions showed by echo consider the average of all heads?

jalammar commented 3 years ago

Do you mean layers? GPT2, for example, would have 12 layers, but only one "language model head". Ecco captures the neuron activations of all the layers.

afcarvallo commented 3 years ago

ahh ok , I refer to heads , for example BERT LM can have multiple attention heads.

jalammar commented 3 years ago

Ah, attention heads! Now I understand, sorry. Ecco doesn't really do much with attention yet. Hugging face transformers would retrieve the attention values for you in the model output if you request it.