Closed davinca closed 1 year ago
https://github.com/tensorflow/tensor2tensor/blob/bafdc1b67730430d38d6ab802cbd51f9d053ba2e/tensor2tensor/layers/common_attention.py#L453
In the orginal paper, the position_embedding is like this: [..., sin i, cos i, ...]
See #177 and #1591 (and #1677).
just different orderings of the same set of channels, The effects of both are consistent theoretically.
https://github.com/tensorflow/tensor2tensor/blob/bafdc1b67730430d38d6ab802cbd51f9d053ba2e/tensor2tensor/layers/common_attention.py#L453
In the orginal paper, the position_embedding is like this: [..., sin i, cos i, ...]