Open tbskrpmnns opened 2 years ago
Maybe simply enable reanalyze
?
If it's that simple that would be awesome. The reason why I asked is that the pseudocode from the paper "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model" (arXiv:1911.08265) differs from the paper where DeepMind introduced the "MuZero Unplugged" in "Online and Offline Reinforcement Learning by Planning with a Learned Model" (arXiv:2104.06294v1). For example I couldn't find the self.reanalyse_fraction
.
kindly try EfficientZero, which also controls the reanalyze part with a fraction argument.
Thanks for the suggestion! In the paper's abstract the authors state that it's a visual based algorithm. RL and in specific offline RL is kind of new for me. That's why I'm wondering whether I could still use the algorithm for my problem, where I don't have a simulator but only a non-image based MDP dataset of (s, a, r, s') rows that I want to use for offline RL.
Purely state-based RL should have been much easier than pixel-based RL. You can still use it except with different inputs.
that makes sense – thanks for you help!
Thanks for the suggestion! In the paper's abstract the authors state that it's a visual based algorithm. RL and in specific offline RL is kind of new for me. That's why I'm wondering whether I could still use the algorithm for my problem, where I don't have a simulator but only a non-image based MDP dataset of (s, a, r, s') rows that I want to use for offline RL.
Actually, I am wondering if someone tried to combine Muzero with COMBO---I think it is the right direction, to overcome offline/policy issue.
Hey,
I'm wondering if there is any intention to expand the code basis for MuZero unplugged to make it work in an offline RL setting?