context-specific embeddings from language model?

Hi, thank you very much for all of your amazing work to implement bleeding-edge attention models in pytorch. This is a question, not an issue. Does the PerformerLM language model learn context-specific embeddings for the tokens? I would like to use this model for a fine-tuning task -- training the language model on a huge dataset, then using part of the trained language model plus some more layers to fine-tune on a downstream task. I think this might be powerful, but only if the PerformerLM trains context-specific embeddings. I know this kind of thing is possible with BERT-like architectures, but I'm interested in much longer sequences, which is why I would like to try it with PerformerLM.

lucidrains / performer-pytorch

context-specific embeddings from language model? #60