[Question] Solving POMDP in TDMPC2

nicklashansen / tdmpc2

Code for "TD-MPC2: Scalable, Robust World Models for Continuous Control"

MIT License

323 stars 69 forks source link

TDMPC does not apply to POMDP

TD-MPC is an off-policy algorithm (same family as DQN, DDPG, SAC, etc.) and can be applied to the same settings as other algorithms in this family. When the environment is a POMDP (such as in our visual RL experiments) a Markovian state can be approximated using frame stacking or a memory mechanism (e.g. RNN); we use frame stacking in our experiments. I have tried RNNs, Transformers, S4 models previously but did not find any benefit in doing so, presumably because our visual RL tasks do not require much history.

nicklashansen / tdmpc2

[Question] Solving POMDP in TDMPC2 #39