jalammar / ecco

Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).
https://ecco.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.98k stars 168 forks source link

Add support for enc-dec LMs #46

Closed JoaoLages closed 3 years ago

JoaoLages commented 3 years ago

First of all, great package and what a clever way to get gradient explainability for transformer LMs! We (me and @SSamDav) would really like to contribute :)

The goal of this PR is for the LM class to be able to handle enc-dec LMs, Masked LMs and Causal LMs. In order to do that, we propose the main changes to the code:

We made sure to adapt the tests in order for them to pass. Also, all the notebooks are running as intended, but now they work for enc-dec models as T5!! πŸŽ‰ Note that the code that isn't used anywhere was not adapted and is probably buggy now with the batch size changes at least.

jalammar commented 3 years ago

Super excited for this!! Will start reviewing today!

JoaoLages commented 3 years ago

Thank you for your review @jalammar. Let's make this merge happen πŸ’ͺ I even think Hugging face would want all of this that you started! Me and @SSamDav we'll try to review the rest of the comments until the end of the week.

The one additional question I have in addition to review comments is about using t5-small in Ecco_Output_Token_Scores.ipynb. The decoder has 6 layers, but to see the actual output probabilities, we have to set layer=6. Curious where the 7th layer comes from if we've not included the embedding layer.

Will investigate this πŸ‘

JoaoLages commented 3 years ago

Okay I went over the PR and left some comments. This is awesome work!

The one additional question I have in addition to review comments is about using t5-small in Ecco_Output_Token_Scores.ipynb. The decoder has 6 layers, but to see the actual output probabilities, we have to set layer=6. Curious where the 7th layer comes from if we've not included the embedding layer.

Seems like I was using a previous version of transformers that had a different API (now I made sure to rerun everything with the same version in requirements.txt).

Looking at the Ecco_Evolution_of_Selected_Token notebook we can see that the first layer is the embedding layer, so HuggingFace changd this behaviour to match with causal/mlm models, which makes sense. I'll add the necessary changes. image