tinkoff-ai / CORL

High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC
https://arxiv.org/abs/2210.07105
Apache License 2.0
1.05k stars 125 forks source link

Questions about D4RL dataset loading for DT #43

Closed shivakanthsujit closed 1 year ago

shivakanthsujit commented 1 year ago

First of all, thank you for the repository. It is really helpful for learning about and developing offline RL algorithms.

While going through the DT code I had two questions.

In the function where the d4rl trajectories are loaded, I think the episode step counting will be off by one for the first episode. The episode_step += 1 should appear before the if condition. With the current implementation, if the first episode was of length 5, then episode_step would only be 4 when entering the if condition.

Also, this function would throw away the last episode in the dataset if the done/timeout flag for the last sample in the dataset is False. Is this expected behavior? And there are datasets where the last sample has both done and timeout to be False, for example in the hopper-random-v2 dataset of the D4RL benchmark. In this dataset, the last episode of 4 samples is discarded and isn't added to traj.

https://github.com/tinkoff-ai/CORL/blob/b62fa28e44a6cd1bd258ec91c09592fb7fe1200d/algorithms/dt.py#L137-L155

Howuhh commented 1 year ago

Hi @shivakanthsujit! Indeed, I think that counting is not correct currently! You can do a PR with correction if you want or we will fix it ourselves. I think adding an assert on episode_step == some array len from episode_data would be also a good idea (or even add len of array as episode_step instead of counting like now)!

Also, this function would throw away the last episode in the dataset if the done/timeout flag for the last sample in the dataset is False. Is this expected behavior?

Yes, this is expected behavior. If trajectory does not have True done or timeout, then this trajectory is not ended in any way during data generation. We decided it was better to skip such trajectories, given how many bugs have been discovered over the years in the D4RL datasets generation 🥲.

shivakanthsujit commented 1 year ago

Yes I can open a PR for this. Yeah I have an assert condition on the length in my version of the code which is when I realized this issue.