Open wphtrying opened 4 hours ago
i did not see some backprop in the mcts tree Search process
see here for reference https://github.com/openreasoner/openr/blob/d869f4f998d55ffe6c84b8092a3d2eb34c7e78c7/reason/guided_search/tree.py#L394
more like best of n, every step choose the best step to search
During SELECT, every step will choose a child node with PUCT, not best of N. During ROLLOUT, we currently only implement two variants, one like per-step best of n.
System Info
i do some experiments on vanila_mcts, i did not see some backprop in the mcts tree Search process, more like best of n , every step choose the best step to search
Who can help?
@ziyuwan
Information
Tasks
Reproduction
.
Expected behavior
.