Closed backpropper closed 2 years ago
The comment on the following line and the line after says that the return and state predictions are output using both the state and action as inputs. Although the equation only seems to use the action information (index 2). Am I missing something or is there some ambiguity? I know that it won't affect the learning since we are only using the action predictions. https://github.com/kzl/decision-transformer/blob/f04280e3668a992c41b38bdfb6b6181d61b4dc52/gym/decision_transformer/models/decision_transformer.py#L97
See https://github.com/kzl/decision-transformer/issues/5: it uses all the information up to and including the latest action token.
The comment on the following line and the line after says that the return and state predictions are output using both the state and action as inputs. Although the equation only seems to use the action information (index 2). Am I missing something or is there some ambiguity? I know that it won't affect the learning since we are only using the action predictions. https://github.com/kzl/decision-transformer/blob/f04280e3668a992c41b38bdfb6b6181d61b4dc52/gym/decision_transformer/models/decision_transformer.py#L97