lowrollr / mctx-az

Monte Carlo tree search in JAX, with functionality to continue search from a previous subtree
Apache License 2.0
14 stars 0 forks source link

speed issue #2

Closed Nightbringers closed 5 months ago

Nightbringers commented 6 months ago

hello, In my test, muzero_policy and alphazero_policy both much slower than gumbel_muzero_policy, use same num_simulations.

And why not make gumbel_muzero_policy and muzero_policy also subtree persistence?

lowrollr commented 6 months ago

Is the issue with speed the same issue you mention here? https://github.com/google-deepmind/mctx/issues/81 Or do you believe this is a separate issue with this fork?

Subtree persistence has only ever been used in the context of AlphaZero. Given that child nodes in MCTS in MuZero are just approximations of state, it makes sense to re-build the tree given that we will have access to the true root node in a subsequent search() call, rather than an approximation of the root saved from the last search.

I created a new policy alphazero_policy to isolate this functionality.

lowrollr commented 6 months ago

I'll add that alphazero_policy and muzero_policy are functionally the same -- alphazero_policy just allows for additional node capacity in the search tree, decoupling capacity from num_simulations.

Nightbringers commented 5 months ago

Is the issue with speed the same issue you mention here? google-deepmind#81 Or do you believe this is a separate issue with this fork?

Subtree persistence has only ever been used in the context of AlphaZero. Given that child nodes in MCTS in MuZero are just approximations of state, it makes sense to re-build the tree given that we will have access to the true root node in a subsequent search() call, rather than an approximation of the root saved from the last search.

I created a new policy alphazero_policy to isolate this functionality.

yes, I hope alphazero_policy can fix this speed issue.

lowrollr commented 5 months ago

alphazero_policy works nearly identically to muzero_policy, so I wouldn't expect to see any speed improvements over muzero_policy in the base repo.

Nightbringers commented 5 months ago

Many people train alphazero use mctx gumbel_muzero_policy, so subtree persistence is useful with gumbel_muzero_policy. Maybe add a new policy, that gumbel_muzero_policy with subtree persistence.

lowrollr commented 5 months ago

The current implementation of gumbel_muzero_policy assumes the tree is empty when search is called -- but I can look into finding a way to support this.

Nightbringers commented 5 months ago

that will be great. If use num_simulations=400, and a child node already search 100 times, then take this node as root in a new search, alphazero_policy will search this node 300 times or 400 times? I mean search times will equal to num_simulations or plus num_simulations?

lowrollr commented 5 months ago

MCTS will run for num_simulations regardless of how many times the root node has already been visited/searched (so 400 times).

See note on out-of-bounds expansions for what happens if the tree runs out of capacity.

Nightbringers commented 5 months ago

thanks, I see you use (mctx.qtransform_by_min_max, min_value=-1, max_value=1) in example, alphazero_policy default qtransform is qtransforms.qtransform_by_parent_and_siblings, which one should I use, which one is better?

lowrollr commented 5 months ago

I just re-used the qtransform in the original example implementation.

I believe in practice qtransform_by_parent_and_siblings is preferred.

Nightbringers commented 5 months ago

thanks, I'll try more alphazero_policy, and hope gumbel_muzero_policy can support subtree persistence.