Fix relative positional multi-head attention layer

sooftware / conformer

[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Apache License 2.0

943 stars 174 forks source link

Closed upskyy closed 9 months ago

upskyy commented 1 year ago

I referred to fairseq's conformer layer multi-head attention. [code] I also confirmed that it is training.

math.sqrt(dim) -> math.sqrt(d_head)
Add relative positional encoding module
Fix _relative_shift method
- input : B X n_head X T X 2T-1
- output : B X n_head X T X T

sooftware commented 9 months ago

Good job.