werner-duvaud / muzero-general

MuZero
https://github.com/werner-duvaud/muzero-general/wiki/MuZero-Documentation
MIT License
2.41k stars 602 forks source link

The model does not converge for breakout #211

Open yungangwu opened 1 year ago

yungangwu commented 1 year ago

Search before asking

Description

I trained muzero for breakout with the hyperparameters given in the code, but up to 450,000 steps, its reward was still 0 and showed no convergence. So I would like to ask, are the hyperparameters in the code validated hyperparameters? Thank, you!

Additional context

No response

JohnPPP commented 1 year ago

Same issue here, but for all envs.

A quinta, 20/10/2022, 03:58, yungangwu @.***> escreveu:

Search before asking

Description

I trained muzero for breakout with the hyperparameters given in the code, but up to 450,000 steps, its reward was still 0 and showed no convergence. So I would like to ask, are the hyperparameters in the code validated hyperparameters? Thank, you! Additional context

No response

— Reply to this email directly, view it on GitHub https://github.com/werner-duvaud/muzero-general/issues/211, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPAYROELDTJJHTULUPDSF3WECYOLANCNFSM6AAAAAARJWGUG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yungangwu commented 1 year ago

Have you tried any other parameter Settings? For example, if batch_size is set to 1024, does the model converge under certain hyperparameter Settings? @JohnPPP

JohnPPP commented 1 year ago

Tried a bunch of hyperparameters on a bunch of games. Just wasted my time. Perhaps others can show me how can this work...

A quinta, 20/10/2022, 07:47, yungangwu @.***> escreveu:

Have you tried any other parameter Settings? For example, if batch_size is set to 1024, does the model converge under certain hyperparameter Settings?

— Reply to this email directly, view it on GitHub https://github.com/werner-duvaud/muzero-general/issues/211#issuecomment-1285026256, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPAYRLVXSFUSRWTKDIO5VTWEDTGZANCNFSM6AAAAAARJWGUG4 . You are receiving this because you commented.Message ID: @.***>

yungangwu commented 1 year ago

gg. I also met the same problem, did a lot of experiments, but nothing happened, I don't know if there is a mistake in the code. @JohnPPP

JohnPPP commented 1 year ago

Yeah, probably is.

A quinta, 20/10/2022, 09:31, yungangwu @.***> escreveu:

gg. I also met the same problem, did a lot of experiments, but nothing happened, I don't know if there is a mistake in the code. @JohnPPP https://github.com/JohnPPP

— Reply to this email directly, view it on GitHub https://github.com/werner-duvaud/muzero-general/issues/211#issuecomment-1285142029, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPAYRLWXHNWUUNPJQ4MWQLWED7OVANCNFSM6AAAAAARJWGUG4 . You are receiving this because you were mentioned.Message ID: @.***>

dillonmsandhu commented 1 year ago

Did the reward stay zero the entire time, or did it occasionally get some reward? I have it working on cartpole, but not on Atari. That said, it still gets a reward of 2 or 3 occasionally in breakout, indicating that it is behaving randomly.

zsn2021 commented 1 year ago

I also encountered the same problem. I adjusted the super parameters for a long time, but I couldn't learn a good effect in my environment

yungangwu commented 1 year ago

Yes, I have this problem. I also experimented with another code, muzero-pytorch, on gomoku games, but I adjusted for a long time and didn't get the ideal results.

---Original--- From: @.> Date: Sat, Dec 31, 2022 23:25 PM To: @.>; Cc: @.>;"State @.>; Subject: Re: [werner-duvaud/muzero-general] The model does not converge forbreakout (Issue #211)

I also encountered the same problem. I adjusted the super parameters for a long time, but I couldn't learn a good effect in my environment

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

zsn2021 commented 1 year ago

Is there a possibility that many networks need to be learned, leading to decision failure. If you can, you can add a contact information and we can communicate privately

yungangwu commented 1 year ago

Yes, that's why I guess, probably because it has three series networks need to optimize together, so very careful training to converge. As far as contact information, I'm using the wechat app. Do you know this app?

zsn2021 commented 1 year ago

您可以加我的微信联系方式 13162062294