The toy example shows the copy function, while the relatively smaller example is a compression problem. Can you please describe how to test the performer attention for a seqtoseq model, like how exactly can I pass a pair of text data, say an English sentence and its French translation and how the workflow, english_sentence -> tokenized -> embed -> encoder -> decoder -> prediction works, or else how do we pass a embedding the performer_enc.
Also given that the PerformerLM class has the line self.token_emb = nn.Embedding(num_tokens, dim) so I think there must be a way to pass text data to this class for seqtoseq.
The toy example shows the copy function, while the relatively smaller example is a compression problem. Can you please describe how to test the performer attention for a seqtoseq model, like how exactly can I pass a pair of text data, say an English sentence and its French translation and how the workflow, english_sentence -> tokenized -> embed -> encoder -> decoder -> prediction works, or else how do we pass a embedding the performer_enc.