tunz / transformer-pytorch

Transformer implementation in PyTorch.
https://tunz.kr/post/4
MIT License
464 stars 102 forks source link

Cant see division of matrix by sqrt(dk) #8

Closed Kikumu closed 1 year ago

Kikumu commented 1 year ago

https://github.com/tunz/transformer-pytorch/blob/e7266679f0b32fd99135ea617213f986ceede056/model/transformer.py#L87

I noticed that according to the paper the query and key values are then divided by d_k before passing to softmax. I dont see it in the code, did i miss anything? Thank you!

tunz commented 1 year ago

https://github.com/tunz/transformer-pytorch/blob/e7266679f0b32fd99135ea617213f986ceede056/model/transformer.py#L84

It's divided by d_k here. self.scale is 1/d_k.

Kikumu commented 1 year ago

Aaah got it - thanks!