Closed nzw0301 closed 1 year ago
I believe this is the intended behavior. The global position embedding should be the same for every element of the sequence and the local position embedding is is different for each element of the sequence. In any case, we found that the specifics of the position embedding didn't make much of a difference in the results. Hope this helps!
Thank you for your clarification!
I'm not familiar with position encoding, but if my understanding is correct, for each sample batch,
global_pos_emb
is used only for a single timestep in the atari code. Is it the intended one?Essentially, the current code computes global position encoding in the following way: