nicklashansen / tdmpc2

Code for "TD-MPC2: Scalable, Robust World Models for Continuous Control"
https://www.tdmpc2.com
MIT License
272 stars 49 forks source link

Question about the role of value prediction loss #15

Closed return-sleep closed 4 months ago

return-sleep commented 5 months ago

Thank you for your wonderful work. I am very curious, what would happen if when training a world model, the value estimation one is dropped and only dynamics loss and reward loss are used for training? Does this world model learn effective latent space, since it lacks the self-supervised signal for reconstruction compared to dreamer.

nicklashansen commented 5 months ago

This is a great question that I don't have a very conclusive answer to at the moment. In my experience, whether one can get away with using only either the dynamics loss or the reward/value prediction losses seem to depend on the particular task. This paper Simplified Temporal Consistency Reinforcement Learning finds that detaching reward/value prediction from the dynamics model can indeed improve sample-efficiency in some cases, but the gains are not very consistent. I would definitely be interested in exploring this more.

nicklashansen commented 4 months ago

Closing this issue but feel free to reopen if you have any follow-up questions!