Open marintoro opened 2 weeks ago
Hello, we reviewed the papers on MuZero and EfficientZero as well as the source code for EfficientZero, and found that they did not mention using reward clipping
. Perhaps they indeed did not employ this technique. Additionally, I consulted the papers by T. Pohlen et al, and as they mentioned, reward clipping can potentially lead to changes in the optimal policy. We will be testing the performance without reward clipping
shortly. Thank you again for your suggestion. If you have any other questions, feel free to discuss them at any time.
Hello, our initial experimental results and analysis can be found here. Best regards.
Hello,
I see in the code that you are using the invertible h function
x ↦ sign(x)(√(|x| + 1) - 1) + εx
to scale the value and the reward target. This function has been introduced by T. Pohlen et al and the idea was to remove the clipping of the reward in Atari game.However I see in the Atari env
(atari_lightzero_env.py)
in the functioncreate_collector_env_cfg
that clip_rewards is set to True. Is that intended or is this a bug?