kzl / decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
MIT License
2.33k stars 440 forks source link

Why the padding is different for state, action, reward? #50

Open CeyaoZhang opened 1 year ago

CeyaoZhang commented 1 year ago

https://github.com/kzl/decision-transformer/blob/c9e6ac0b75895cef3e7c06cd309fd398ec9ceef5/gym/experiment.py#L147-L154

It's easy to understand padding the state with np.zero(,), but why use np.ones(,)* -10 to pad the action and np.ones(,) * 2 to pad the done flag?