Absolute Position Encoding：Why are the two tensors not alternately merged？

tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Apache License 2.0

15.58k stars 3.51k forks source link

Closed davinca closed 1 year ago

davinca commented 1 year ago

In the orginal paper, the position_embedding is like this: [..., sin i, cos i, ...]

martinpopel commented 1 year ago

See #177 and #1591 (and #1677).

davinca commented 1 year ago

just different orderings of the same set of channels, The effects of both are consistent theoretically.