shindavid / AlphaZeroArcade

6 stars 1 forks source link

Estimate alphazero run variance #69

Open shindavid opened 1 year ago

shindavid commented 1 year ago

We have started doing A/B testing of changes by launching an alphazero run with/without a change, and comparing the resultant rating curves.

This methodology assumes that the variation between the resultant curves is wholly attributable to the change being tested. But that assumption itself deserves to be examined!

We can do this examination by launching the exact same run many times, and looking at the distribution of rating curves.

If done naively on the cloud, this may be quite costly. For example, if we do 100 runs for 24 hours each, at the cost of ~$1/gpu-hour, this will be a $2400 experiment! So if using the cloud, we probably want to do it for less than 24 hours per run - we likely want to skip the less-interesting beginning-of-the-learning, so we may want https://github.com/shindavid/AlphaZeroArcade/issues/68.

Then there is the statistical problem of quantifying the variation in the curves. We can tackle that when we get there.