How can I align the indices of coords and embbeddings to be the same to use coords as input of positional encoding.
The two inputs share the same indices, with conv(stride=2) and pooling(stride=2), the output indices should be the same?
Looks like the set of the conv output and pooling output indices are the same but in different orders.
How can I align the indices of coords and embbeddings to be the same to use coords as input of positional encoding. The two inputs share the same indices, with conv(stride=2) and pooling(stride=2), the output indices should be the same?
Looks like the set of the conv output and pooling output indices are the same but in different orders.
Thanks