Open kblomdahl opened 4 years ago
Initial implementation seems to work, but without a scheme such as virtual visits does not allow for sufficient parallelism to allow for acceptable performance.
Hi Chicoryn, I am also interested in this paper. I wonder what does virtual visits here mean? I guess the original dichotomy search scheme scales well with the number of legal actions (complexity of ~log(n)), and dding it to the action selection stage won't cost too much.
@PeppaCat Hello, this was some time ago so I must admit I've forgotten a lot of the implementation details of my experiments. But virtual visits normally refers to a penalty [1] added to leaf nodes that are currently being expanded to avoid the next probe into the search tree from trying to expand the same node again. This is very important in real play scenarios where we are not restricted by the number of probes into the search tree, but rather by a real-time limit so it's better to fill each batch fully, instead of running a half-full batch through the value function.
As for your real question, I am not sure about the answer since as you indicate it seems like it should be fairly straight forward to introduce virtual losses into equation (3). I suspect that I did not have sufficient time / energy to investigate this further.
[1] https://github.com/Chicoryn/dream-go/blob/master/src/libdg_mcts/tree.rs#L1368
http://proceedings.mlr.press/v119/grill20a/grill20a.pdf http://proceedings.mlr.press/v119/grill20a/grill20a-supp.pdf