MuZero Unplugged - Githubissues

werner-duvaud / muzero-general

MuZero

https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation

MIT License

2.42k stars 602 forks source link

MuZero Unplugged #185

Open tbskrpmnns opened 2 years ago

tbskrpmnns commented 2 years ago

Hey,

I'm wondering if there is any intention to expand the code basis for MuZero unplugged to make it work in an offline RL setting?

0xJchen commented 2 years ago

Maybe simply enable reanalyze?

tbskrpmnns commented 2 years ago

If it's that simple that would be awesome. The reason why I asked is that the pseudocode from the paper "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model" (arXiv:1911.08265) differs from the paper where DeepMind introduced the "MuZero Unplugged" in "Online and Offline Reinforcement Learning by Planning with a Learned Model" (arXiv:2104.06294v1). For example I couldn't find the self.reanalyse_fraction.

0xJchen commented 2 years ago

kindly try EfficientZero, which also controls the reanalyze part with a fraction argument.

tbskrpmnns commented 2 years ago

Thanks for the suggestion! In the paper's abstract the authors state that it's a visual based algorithm. RL and in specific offline RL is kind of new for me. That's why I'm wondering whether I could still use the algorithm for my problem, where I don't have a simulator but only a non-image based MDP dataset of (s, a, r, s') rows that I want to use for offline RL.

0xJchen commented 2 years ago

Purely state-based RL should have been much easier than pixel-based RL. You can still use it except with different inputs.

tbskrpmnns commented 2 years ago

that makes sense – thanks for you help!

dbsxdbsx commented 1 year ago

Thanks for the suggestion! In the paper's abstract the authors state that it's a visual based algorithm. RL and in specific offline RL is kind of new for me. That's why I'm wondering whether I could still use the algorithm for my problem, where I don't have a simulator but only a non-image based MDP dataset of (s, a, r, s') rows that I want to use for offline RL.

Actually, I am wondering if someone tried to combine Muzero with COMBO---I think it is the right direction, to overcome offline/policy issue.