huangeddie / MuZeroGoJax

Mu Zero Go implemented with JAX and GoJAX
MIT License
9 stars 0 forks source link

Simple custom sequential search tree exploration #286

Open huangeddie opened 1 year ago

huangeddie commented 1 year ago

Phases

  1. Simulate 1 move for the m sampled actions
  2. Simulate 2 moves for the m / 2 sampled actions
  3. Simulate 3 moves for the m / 4sampled actions
  4. ... Total number of simulations = c(c+1) / 2 where c = log_2(m),
huangeddie commented 1 year ago

Or for planning at non-root nodes, just simulate 2/3 moves head, and treat it as stochastic bandit