Closed JoaoLages closed 3 years ago
Super excited for this!! Will start reviewing today!
Thank you for your review @jalammar. Let's make this merge happen πͺ I even think Hugging face would want all of this that you started! Me and @SSamDav we'll try to review the rest of the comments until the end of the week.
The one additional question I have in addition to review comments is about using t5-small in Ecco_Output_Token_Scores.ipynb. The decoder has 6 layers, but to see the actual output probabilities, we have to set layer=6. Curious where the 7th layer comes from if we've not included the embedding layer.
Will investigate this π
Okay I went over the PR and left some comments. This is awesome work!
The one additional question I have in addition to review comments is about using t5-small in Ecco_Output_Token_Scores.ipynb. The decoder has 6 layers, but to see the actual output probabilities, we have to set layer=6. Curious where the 7th layer comes from if we've not included the embedding layer.
Seems like I was using a previous version of transformers
that had a different API (now I made sure to rerun everything with the same version in requirements.txt).
Looking at the Ecco_Evolution_of_Selected_Token
notebook we can see that the first layer is the embedding layer, so HuggingFace changd this behaviour to match with causal/mlm models, which makes sense.
I'll add the necessary changes.
First of all, great package and what a clever way to get gradient explainability for transformer LMs! We (me and @SSamDav) would really like to contribute :)
The goal of this PR is for the LM class to be able to handle enc-dec LMs, Masked LMs and Causal LMs. In order to do that, we propose the main changes to the code:
type(lm.model)
for exampleWe made sure to adapt the tests in order for them to pass. Also, all the notebooks are running as intended, but now they work for enc-dec models as T5!! π Note that the code that isn't used anywhere was not adapted and is probably buggy now with the batch size changes at least.