Per the following code: attn = attn + relative_position_bias.unsqueeze(0), found in the windowed attention operation, is there a dot product operation with the query vector that is needed per the Vaswani et al. (2018) paper (https://arxiv.org/pdf/1803.02155.pdf), equation 5, second term on the numerator?
Per the following code: attn = attn + relative_position_bias.unsqueeze(0), found in the windowed attention operation, is there a dot product operation with the query vector that is needed per the Vaswani et al. (2018) paper (https://arxiv.org/pdf/1803.02155.pdf), equation 5, second term on the numerator?