Closed mHsuann closed 3 years ago
Hi,
You can get a pretty thorough explanation in our docs https://fast-transformers.github.io/attention/ .
TL;DR: The query lengths is a mask that defines the number of queries in each sequence in the batch. The key lengths defines the number of keys and the attention mask defines where each query can attend to. For a transformer encoder, the query lengths and key lengths should be the same.
I am closing the comment but feel free to reopen it if you have more questions.
Cheers, Angelos
Hi, I tried to implement 'clustered attention' in my Transformer model, and the code of my model is following by 'The Annotated Transformer.' Then, I encountered some problems while implementing the code: the self-attenion module in Annotated Tranformer is:
However, the clustered attention module is:
So, I got the TypeError.
I would like to ask where does query_lenghs come from? I only know that it got it through
lengths(self)
(Sorry for my limited English.)Thank You !!!