Open CeyaoZhang opened 1 year ago
https://github.com/kzl/decision-transformer/blob/c9e6ac0b75895cef3e7c06cd309fd398ec9ceef5/gym/experiment.py#L147-L154
It's easy to understand padding the state with np.zero(,), but why use np.ones(,)* -10 to pad the action and np.ones(,) * 2 to pad the done flag?
np.zero(,)
np.ones(,)* -10
np.ones(,) * 2
https://github.com/kzl/decision-transformer/blob/c9e6ac0b75895cef3e7c06cd309fd398ec9ceef5/gym/experiment.py#L147-L154
It's easy to understand padding the state with
np.zero(,)
, but why usenp.ones(,)* -10
to pad the action andnp.ones(,) * 2
to pad the done flag?