Closed tbskrpmnns closed 2 years ago
Saw your comments on EfficientZero. I suppose you can just fill the replay buffer manually and trained the agent based on that.
Thanks again for your help! If I understand it correctly, I can use then MuZero / EfficientZero or any other algorithm suitable for offline RL using a replay buffer that I can simply fill manually for the offline RL use-case?
Not limited to MuZero, you can manually fill up the replay buffer. For the advanced support of model-based RL algorithms, it's still on the TODO list in this project.
awesome, thank you very much!
Hey,
I was really impressed by DeepMind's latest progress in their offline RL version of the MuZero algorithm arXiv:2104.06294. Since it provides sota results for several offline RL benchmarks and is in addition data efficient because of the model-based RL approach, I was wondering whether you have any plans in implementing MuZero Unplugged?