kzl / decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
MIT License
2.33k stars 441 forks source link

State and Return preds input #25

Closed backpropper closed 2 years ago

backpropper commented 2 years ago

The comment on the following line and the line after says that the return and state predictions are output using both the state and action as inputs. Although the equation only seems to use the action information (index 2). Am I missing something or is there some ambiguity? I know that it won't affect the learning since we are only using the action predictions. https://github.com/kzl/decision-transformer/blob/f04280e3668a992c41b38bdfb6b6181d61b4dc52/gym/decision_transformer/models/decision_transformer.py#L97

kzl commented 2 years ago

See https://github.com/kzl/decision-transformer/issues/5: it uses all the information up to and including the latest action token.