opendilab / LightZero

[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal
Apache License 2.0
1.04k stars 107 forks source link

Hyperparameter of Muzero and reproducibility of the results #229

Closed marintoro closed 2 months ago

marintoro commented 3 months ago

Hello,

I am trying to reproduce the result of Muzero on Atari (I am using the MsPacmanNoFrameskip-v4 env as it's the one with the most published result on the original paper of Muzero).

I have 2 questions about this:

1) About the default hyperparameter:

My main question for this topic is: How the Muzero's hyperparameters were chosen and do they match the hyperparameters used in the main paper. And if some doensn't match or are not known, is there a list of such different or not known hyperparemeter from the original paper?

2) About the performance:

In the Readme.md there is some results on common benchmark and tasks such as MsPacmanNoFrameskip-v4. Problem is that the performance are reported on a really tiny fraction of the steps reported in the main paper 200M (or even 20 Billions) vs 0.4M env steps and thus the final results are really not comparable, e.g. on Pacman Muzero reach scores around 230 000 against 2 500 on your small experience on 0.4M steps...

My main question for this topic is: Did you try to run some experiments on comparable number of steps than original Muzero (i.e. at least 200M env steps) and on those experiments are the results you obtain comparable with the one from the original Muzero paper?

puyuan1996 commented 3 months ago

Hello, thank you for your question.

Best wishes!

marintoro commented 3 months ago

Thanks for the really fast answer! The Table 7 from your paper with all the hyperparameters is exactly what I was looking for!

Indeed it could be convenient that the default configuration for each algorithm (e.g.atari_muzero_config.py ) match the one you actually used in your experiments.