Closed ChanganVR closed 3 years ago
It is easier to give an example. To perform relative attention, we want to relatively shift the attention score matrix as follows:
a00 a01 a02 a02 0 a10
a10 a11 a12 => a11 a12 0
a20 a21 a22 a20 a21 a22
What the _relative_shift
does is just a clear way of achieving the transformation above:
a00 a01 a02 0 a00 a01 a02 0 a00 a01 a02 0 a10
a10 a11 a12 => 0 a10 a11 a12 => a02 0 a10 => a11 a12 0
a20 a21 a22 0 a20 a21 a22 a11 a12 0 a20 a21 a22
a20 a21 a22
@sooftware Thank you for your reply! I'm a bit more confused now. In the example you gave, why would a10 appear in the first row? That corresponds to how much the 1st element attends to the 0th element right?
Also, do you mind explaining a bit more about why the attention is shifted this way? e.g. a00 becomes a02 and a01 becomes 0, what are the intuitions behind this transformation? If we denote the new matrix as B, then b00 should mean the relative position information between the first element and the first element right? why would it be a02?
Hi, I found the upper triangle of pos_score
seemed not masked. Will this matter for the performance?
Also, this relative positional encoding seemed to only work with causal sequence. However, according to the original paper appendix B
i − j can only be integer from 0 to M + L − 1
which includes both directions.
@ChanganVR i got exactly the same confusion here; The post sooftware given looks correct to me only when the upper left triangle of QE_r is set to zero before the "skewing" process; Do you have any clue on it?
@windysonic I'm sorry that I don't remember any details about this issue since it's been so long.
Hi @sooftware, thank you for coding this repo. I have a question about the relative shift function: https://github.com/sooftware/conformer/blob/c76ff16d01b149ae518f3fe66a3dd89c9ecff2fc/conformer/attention.py#L105 I don't quite understand how this function works. Could you elaborate on this?
An example input and output of size 4 is shown below, which does not really make sense to me.
Input:
output:
Thank you!