lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch
MIT License
1.07k stars 143 forks source link

context-specific embeddings from language model? #60

Open rainwala opened 3 years ago

rainwala commented 3 years ago

Hi, thank you very much for all of your amazing work to implement bleeding-edge attention models in pytorch. This is a question, not an issue. Does the PerformerLM language model learn context-specific embeddings for the tokens? I would like to use this model for a fine-tuning task -- training the language model on a huge dataset, then using part of the trained language model plus some more layers to fine-tune on a downstream task. I think this might be powerful, but only if the PerformerLM trains context-specific embeddings. I know this kind of thing is possible with BERT-like architectures, but I'm interested in much longer sequences, which is why I would like to try it with PerformerLM.