kzl / decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
MIT License
2.33k stars 440 forks source link

Questions about dataset preprocessing #55

Open typoverflow opened 1 year ago

typoverflow commented 1 year ago

Hi, I have some question about the data preprocessing of medium-replay datasets. In the provided implementation, https://github.com/kzl/decision-transformer/blob/e2d82e68f330c00f763507b3b01d774740bee53f/gym/data/download_d4rl_datasets.py#L35...L40

whenever the final_timestep or done_bool is true, the collected data will be added as a trajectory. However in D4RL's docs,

Timeouts in this (medium-replay) dataset are not always marked when the agent reaches the max trajectory length, but rather when 1000 timesteps have been sampled for a particular training iteration.

Thus, there exist trajectories which are not done or timeout but rather truncated due to the limitation of sampling steps. Such trajectories are typically short in length, and if we compute return on these trajs, the return-to-go will be deviated from its true value since we don't give an estimated value for the last timestep. Will this be an issue for DT?

Please correct me if there is any mis-understanding =)