kzl / decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
MIT License
2.33k stars 440 forks source link

Padding tokens represented differently in different parts of the code #41

Open asmadotgh opened 2 years ago

asmadotgh commented 2 years ago

Thank you for your great work and for making your code available!

A question regarding padding tokens: They seem to be handled slightly differently in different parts of the code. When loading the data to run the experiments, it appears that padding token values are informed by the environment characteristics (e.g. -10 for actions in mujoco, 2 for dones, and 0 for other types of tokens). However, on the model side for action prediction, all padding tokens are zeros. We were unsure about the reason behind this difference in representing padding tokens. However, we inferred that since the attention mask reflects the position of padding tokens, that would override these slight differences ultimately.

We noticed the same pattern in other work, such as GDT that was built on top of this repository.

Could you please let us know more about your implementation of padding tokens and why they are represented differently? And do their actual values matter when their position is reflected in the attention mask?