sr5434 / AlphaZero

Implementing the AlphaZero algorithm for multiple games with PGX and MCTX
MIT License
2 stars 0 forks source link

where is Dynamic and representation #1

Closed Nightbringers closed 7 months ago

Nightbringers commented 7 months ago

It looks like still a alphazero, can you Implementing muzero in go game?

sr5434 commented 7 months ago

Just edit the config environment to say "go_9x9"

Nightbringers commented 7 months ago

muzero have Dynamic and representation and prediction, but you code do not have.

sr5434 commented 7 months ago

Isn’t that implemented in the MuZero gumbel policy from MCTX

sr5434 commented 7 months ago

Also I can't figure out prediction, but feel free to implement it yourself

Nightbringers commented 7 months ago

The representation F output is not policy and value. It's an s, which is the input of Dynamic F or prediction F.
it's like this: input -> representation F -> s, s-> prediction F -> p,v s -> Dynamic F -> next s next s -> prediction F -> p,v you implemented doesn't look like muzero. you may need to read muzero paper more carefully.

sr5434 commented 7 months ago

I decided to just do AlphaZero, as I couldn't understand MuZero very well. However, I used Muax to build a trading algorithm using the true MuZero algorithm, which I just pushed.