Closed Nightbringers closed 5 months ago
Is the issue with speed the same issue you mention here? https://github.com/google-deepmind/mctx/issues/81 Or do you believe this is a separate issue with this fork?
Subtree persistence has only ever been used in the context of AlphaZero. Given that child nodes in MCTS in MuZero are just approximations of state, it makes sense to re-build the tree given that we will have access to the true root node in a subsequent search() call, rather than an approximation of the root saved from the last search.
I created a new policy alphazero_policy
to isolate this functionality.
I'll add that alphazero_policy
and muzero_policy
are functionally the same -- alphazero_policy
just allows for additional node capacity in the search tree, decoupling capacity from num_simulations
.
Is the issue with speed the same issue you mention here? google-deepmind#81 Or do you believe this is a separate issue with this fork?
Subtree persistence has only ever been used in the context of AlphaZero. Given that child nodes in MCTS in MuZero are just approximations of state, it makes sense to re-build the tree given that we will have access to the true root node in a subsequent search() call, rather than an approximation of the root saved from the last search.
I created a new policy
alphazero_policy
to isolate this functionality.
yes, I hope alphazero_policy can fix this speed issue.
alphazero_policy
works nearly identically to muzero_policy
, so I wouldn't expect to see any speed improvements over muzero_policy
in the base repo.
Many people train alphazero use mctx gumbel_muzero_policy, so subtree persistence is useful with gumbel_muzero_policy. Maybe add a new policy, that gumbel_muzero_policy with subtree persistence.
The current implementation of gumbel_muzero_policy assumes the tree is empty when search is called -- but I can look into finding a way to support this.
that will be great. If use num_simulations=400, and a child node already search 100 times, then take this node as root in a new search, alphazero_policy will search this node 300 times or 400 times? I mean search times will equal to num_simulations or plus num_simulations?
MCTS will run for num_simulations
regardless of how many times the root node has already been visited/searched (so 400 times).
See note on out-of-bounds expansions for what happens if the tree runs out of capacity.
thanks, I see you use (mctx.qtransform_by_min_max, min_value=-1, max_value=1) in example, alphazero_policy default qtransform is qtransforms.qtransform_by_parent_and_siblings, which one should I use, which one is better?
I just re-used the qtransform in the original example implementation.
I believe in practice qtransform_by_parent_and_siblings
is preferred.
thanks, I'll try more alphazero_policy, and hope gumbel_muzero_policy can support subtree persistence.
hello, In my test, muzero_policy and alphazero_policy both much slower than gumbel_muzero_policy, use same num_simulations.
And why not make gumbel_muzero_policy and muzero_policy also subtree persistence?