MCTS simulation in parallel

Make sure you check out this paper. Several parallelization schemes are reviewed in section 6.3. However, I have no experience in implementing those algorithms. I think parallelization in MCTS depends heavily on which tree policy you use. One reason MCTS is hard to parallelize is because UCB-based policies are sequential in nature.

As for your case, the whole purpose of TD-Learning (or other RL approaches) is to learn the long-term expected rewards so that you don't have to search very deep in the tree to find a good policy. Although it may take quite a while if your game is complicated.

hrpan / tetris_mcts

MCTS simulation in parallel #3