nicklashansen / tdmpc2

Code for "TD-MPC2: Scalable, Robust World Models for Continuous Control"
https://www.tdmpc2.com
MIT License
323 stars 69 forks source link

[Question] Solving POMDP in TDMPC2 #39

Open gmmyung opened 1 month ago

gmmyung commented 1 month ago

First of all, thank you for open-sourcing this algorithm. I am trying to train a quadruped robot locomotion policy with multimodal input, including egocentric depth vision, on complex terrain. However, I have noticed that TDMPC does not apply to POMDP. Setting observation history as input is possible, but the planning step might be able to benefit from past latent dynamics. Have you conducted any experiments on such ideas?

nicklashansen commented 2 weeks ago

TDMPC does not apply to POMDP

TD-MPC is an off-policy algorithm (same family as DQN, DDPG, SAC, etc.) and can be applied to the same settings as other algorithms in this family. When the environment is a POMDP (such as in our visual RL experiments) a Markovian state can be approximated using frame stacking or a memory mechanism (e.g. RNN); we use frame stacking in our experiments. I have tried RNNs, Transformers, S4 models previously but did not find any benefit in doing so, presumably because our visual RL tasks do not require much history.