kzl / decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
MIT License
2.33k stars 440 forks source link

position embeddings do not vary between various time steps #64

Closed udaymallappa closed 4 months ago

udaymallappa commented 1 year ago

The code for the gym seems to be in alignment with the paper. However, the Atari code seems to be having some inconsistencies w.r.t pos embeddings. The timesteps seems to be storing just the start index of the block. For example, with a block_size of 30, the following line model_atari.py indicates that the embeddings at time step t1, t2, .... t30 are the same. This is different from how Gym code implements timesteps stores the entire sequence of indices using "timesteps.append(np.arange(s[-1].shape[1]).reshape(1, -1))"

position_embeddings = torch.gather(all_global_pos_emb, 1, torch.repeat_interleave(timesteps, self.config.n_embd, dim=-1)) + self.pos_emb[:, :token_embeddings.shape[1], :]

kzl commented 4 months ago

It is a little different but shouldn't matter much.