werner-duvaud / muzero-general

MuZero
https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation
MIT License
2.46k stars 606 forks source link

Breakout #53

Closed pdutoit2011 closed 4 years ago

pdutoit2011 commented 4 years ago

Thank you very much for a comprehensive implementation.

I ran Breakout with the current configuration, except changing the actors from 350 to 4 since I ran into memory problems with Ray. I am using the same setup as specified tested on, except GTX 1060.

The code reports one line, but never updates this line. On Tensorboard I see progress, but at zero rewards.

Any advice?

werner-duvaud commented 4 years ago

Hi,

About the line in the console displaying the progress, it is normal that there is only one but it should be updated. If the number of self played games, the loss or the number of training steps does not change, there is a problem.

About the reward that remains at 0, it depends on the number of selfplayed games, it may be that it takes a lot of games to get a first reward. In particular, I remember that on breakout, it must first learn to press fire to start the game and then catch the ball. At first it should play almost in random mode. It can therefore take a little time during which it stays at zero.

However, we haven't had time yet to test enough breakout and Atari games. This requires a lot of computing power and can take a long time with only 4 actors. DeepMind reported using 350 actors.

pdutoit2011 commented 4 years ago

Hi,

Thanks for the quick response.

Could you provide the version of Breakout that you used to test. I would like to see some progress and then start working on it further.

On Thu, May 21, 2020 at 2:30 PM Werner Duvaud notifications@github.com wrote:

Hi,

About the line in the console displaying the progress, it is normal that there is only one but it should be updated. If the number of self played games, the loss or the number of training steps does not change, there is a problem.

About the reward that remains at 0, it depends on the number of selfplayed games, it may be that it takes a lot of games to get a first reward. In particular, I remember that on breakout, it must first learn to press fire to start the game and then catch the ball. At first it should play almost in random mode. It can therefore take a little time during which it stays at zero.

However, we haven't had time yet to test enough breakout and Atari games. This requires a lot of computing power and can take a long time with only 4 actors. DeepMind reported using 350 actors.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/werner-duvaud/muzero-general/issues/53#issuecomment-632059163, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALB2BSJDCKXMRKIR7OQDOXLRSUNFFANCNFSM4NGROGCQ .

werner-duvaud commented 4 years ago

We used the version available in the games folder. We just decreased some parameters (num_actors, num_channels, batch_size, etc.) to run it on a less powerful server.

pdutoit2011 commented 4 years ago

Thanks, yes I meant provide these parameters.

I will look at the updates to determine what to use.

Thanks again for a comprehensive implementation. I researched thoroughly and decided to build upon your work since it is such a good inspiration. I hope to give some positive feedback, also on other Atari games.

werner-duvaud commented 4 years ago

Great, don't hesitate to join the discord to discuss the results and share your experience. About hyperparameters, I do not remember exactly what I had. I lowered them to have something adapted to my computational resources while trying to keep hyperparameters close to the paper's one. The goal for the test was not to solve breakout but to get some rewards. We are not trying to solve breakout for the moment. The parameters that I changed are as follows: num_actors, num_channels, num_reduced_channels, batch_size, stacked_observations.

The hardware was: RTX 2080, 128GO RAM, 12 core Intel Xeon E5v3