Closed kalpitdixit closed 5 years ago
head_embeddings
are the values, and the keys are determined by the context_embeddings
. In hindsight, I probably should have removed this separation for simplicity in the final version, since the improvement wasn't very large.context_embeddings
have a window size of 10 and the head_embeddings
have a window size of 2.Hope that helps!
Thanks for the fast and complete answers!
For "3." above, I see how using a smaller window size for the head_embeddings
compared to the context_embeddings
makes sense. Because the head_embeddings
are used to represent a span which is typically a few tokens vs context_embeddings
which are used to represent entire sentences.
Nice idea.