Open huangeddie opened 1 year ago
Phases
m
m / 2
m / 4
c(c+1) / 2
c = log_2(m)
Or for planning at non-root nodes, just simulate 2/3 moves head, and treat it as stochastic bandit
Phases
m
sampled actionsm / 2
sampled actionsm / 4
sampled actionsc(c+1) / 2
wherec = log_2(m)
,