openreasoner / openr

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
https://openreasoner.github.io/
MIT License
1.05k stars 76 forks source link

some questions about vanila_mcts #59

Open wphtrying opened 4 hours ago

wphtrying commented 4 hours ago

System Info

i do some experiments on vanila_mcts, i did not see some backprop in the mcts tree Search process, more like best of n , every step choose the best step to search

Who can help?

@ziyuwan

Information

Tasks

Reproduction

.

Expected behavior

.

ziyuwan commented 3 hours ago

i did not see some backprop in the mcts tree Search process

see here for reference https://github.com/openreasoner/openr/blob/d869f4f998d55ffe6c84b8092a3d2eb34c7e78c7/reason/guided_search/tree.py#L394

more like best of n, every step choose the best step to search

During SELECT, every step will choose a child node with PUCT, not best of N. During ROLLOUT, we currently only implement two variants, one like per-step best of n.