Closed Nightbringers closed 7 months ago
Just edit the config environment to say "go_9x9"
muzero have Dynamic and representation and prediction, but you code do not have.
Isn’t that implemented in the MuZero gumbel policy from MCTX
Also I can't figure out prediction, but feel free to implement it yourself
The representation F output is not policy and value. It's an s, which is the input of Dynamic F or prediction F.
it's like this: input -> representation F -> s, s-> prediction F -> p,v s -> Dynamic F -> next s next s -> prediction F -> p,v
you implemented doesn't look like muzero.
you may need to read muzero paper more carefully.
I decided to just do AlphaZero, as I couldn't understand MuZero very well. However, I used Muax to build a trading algorithm using the true MuZero algorithm, which I just pushed.
It looks like still a alphazero, can you Implementing muzero in go game?