Closed Kikumu closed 1 year ago
https://github.com/tunz/transformer-pytorch/blob/e7266679f0b32fd99135ea617213f986ceede056/model/transformer.py#L87
I noticed that according to the paper the query and key values are then divided by d_k before passing to softmax. I dont see it in the code, did i miss anything? Thank you!
https://github.com/tunz/transformer-pytorch/blob/e7266679f0b32fd99135ea617213f986ceede056/model/transformer.py#L84
It's divided by d_k here. self.scale is 1/d_k.
d_k
self.scale
1/d_k
Aaah got it - thanks!
https://github.com/tunz/transformer-pytorch/blob/e7266679f0b32fd99135ea617213f986ceede056/model/transformer.py#L87
I noticed that according to the paper the query and key values are then divided by d_k before passing to softmax. I dont see it in the code, did i miss anything? Thank you!