kzl / decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
MIT License
2.33k stars 441 forks source link

Question: is it possible to use the same Decision Transformer for new training trajectories generation? #26

Closed danielgafni closed 2 years ago

danielgafni commented 2 years ago

Maybe I'm missing something, but why do we stop the training after going over the initial trajectories dataset? Can the same model be run again to generate new (better) trajectories and trained on them in an iterative manner? Thanks for your time!

Howuhh commented 2 years ago

@danielgafni Thought of that too (well, it immediately comes to mind). I'm now trying to test exactly this on a simple environment, similar to how it's done in Upside Down RL (or reward-conditioned RL). So, if you're interested, we can chat about it hehe

kzl commented 2 years ago

The original paper studies offline RL, so this is not done. Offline pretraining -> online finetuning has been studied in various papers, and is generally useful but nontrivial; it's definitely an active area of research!