Suggestions/Code for Offline Data Collection

nicklashansen / tdmpc2

Code for "TD-MPC2: Scalable, Robust World Models for Continuous Control"

MIT License

325 stars 69 forks source link

I am implementing the data collection for the offline dataset to train the multitask model for a different set of tasks outside mt80.

According to the paper: The datasets are sourced from the replay buffers of 240 single-task agents. Does this mean that the transitions in offline data use actions from tdmpc2.act(), where tdmpc2 is the trained world model for each task? Do you observe any convergence issue and/or sample efficiency reduction if we used random action from env.rand_act() instead of from the agent policy tdmpc2.act()?

If possible, can you also share/publicize the code used for offline data collection? It would be extremely helpful for anyone who wants to start their research and use TDMPC2 as a benchmark.

Thank you very much!

Yes, the datasets that we released are sourced from the replay buffers of single-task agents, meaning that we train each single-task agent for 3M steps and save the entire replay buffer. Aside from the initial seeding phase where random actions are taken, actions are therefore generated by agents at various stages of training which makes the dataset quite diverse. Training an offline RL algorithm strictly on random data is unlikely to yield much success since most tasks cannot be solved with a random policy (e.g. pick and place would not be solved from data of a gripper moving around randomly in 3D space).

If possible, can you also share/publicize the code used for offline data collection?

We rely on torchrl for our replay buffer implementation. The Storage object has a save function which saves the replay buffer to disk. Documentation: https://pytorch.org/rl/stable/reference/generated/torchrl.data.replay_buffers.Storage.html#torchrl.data.replay_buffers.Storage.save

nicklashansen / tdmpc2

Suggestions/Code for Offline Data Collection #40