leela-zero / leela-zero

Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper.
GNU General Public License v3.0
5.36k stars 1.01k forks source link

Is Dirichlet noise alpha correctly computed? #1374

Closed Vandertic closed 6 years ago

Vandertic commented 6 years ago

https://github.com/gcp/leela-zero/blob/e1284477eb0c66d41af8c80088cbb488caf20341/src/UCTNodeRoot.cpp#L192

Apologies if I am not understanding correctly the code here, but it seems to me that the line is expanded into

auto alpha = 0.03f * 361.0f / 19 * 19;

which is actually equal to

auto alpha = 0.03f * 361.0f;

At least, this is what happens in my local clone of (admittedly) an old release.

gcp commented 6 years ago

Ouch.

jkiliani commented 6 years ago

Wait, shouldn't BOARD_SQUARES be computed before the division? I'd be really puzzled at C++ if this was really a bug...

Edit: thanks @roy7 @gcp for explaining C++, I can see there are a lot of pitfalls..

roy7 commented 6 years ago

@jkiliani It's a #define not a global variable.

#define WIDTH 80+20
int a = WIDTH * 2; //expect a==200 but a==120
gcp commented 6 years ago

It's a macro, it does text substitution.

The mistake was not bracketing the macro, which is the normal defense against this kind of bug.

gcp commented 6 years ago

This could've been a constexpr to avoid this issue, but that's a bit inconsistent with the rest of the file.

roy7 commented 6 years ago

Does this mean noise was 361 times too high?

diadorak commented 6 years ago

Is this a recent change? If it's working, maybe higher noise actually helps?

gcp commented 6 years ago

No, it means the noise distribution was really flat. The idea of the Dirichlet noise is to be peaky and make the odd moves jump out. So this just made is less likely to find new things.

This was broken in 290587372b3044595989b853e5e3f9f8eedacd36 I think.

Vandertic commented 6 years ago

361 times too low in some sense. The noise gets always 25% of the policy probability. With alpha=0.03 it typically concentrates on 10 moves which then have ~2.5% probability. With alpha=10.83 the noise would be spread on all the legal moves more or less equally... 25%/n each

LRGB commented 6 years ago

Would this merit a 0.14.1 release?

gcp commented 6 years ago

I'd want to add support for the v=1 filtering discussed, but yes this warrants a new release.

bochen2027 commented 6 years ago

Is there a quick mechanism to tack on a banner message to remind clients whenever a new critical update has been out to , maybe not a full blown autoupdater but at least something to get attention

Marcin1960 commented 6 years ago

@gcp "No, it means the noise distribution was really flat. The idea of the Dirichlet noise is to be peaky and make the odd moves jump out. So this just made is less likely to find new things."

What practical effect it could have had?

diadorak commented 6 years ago

Could it be related to the much narrower spread of networks' strength around the time we switch to 10x128?

http://zero.sjeng.org/static/elo.html

gcp commented 6 years ago

Is there a quick mechanism to tack on a banner message to remind clients whenever a new critical update has been out to , maybe not a full blown autoupdater but at least something to get attention

The AutoGTP + Leela Zero version can be enforced on the server and clients will get a warning they need to update.

This bug has been here for 2 months so I don't see any reason to immediately force it without giving people a chance to update.

gcp commented 6 years ago

Could it be related to the much narrower spread of networks' strength around the time we switch to 10x128?

Noise isn't used for matches so I doubt it.

jkiliani commented 6 years ago

No, it means the noise distribution was really flat. The idea of the Dirichlet noise is to be peaky and make the odd moves jump out. So this just made is less likely to find new things.

I can see now that this must have made the effect of FPU reduction on noise much worse than it looked even in Leela Chess, where Dirichlet Alpha was correct after all. We effectively had no noise effect at all for a while before FPU reduction was taken off the noised root...

While this would be a deviation from Deepmind's parameters, could we consider increasing epsilon somewhat from its current value of 0.25?

Ishinoshita commented 6 years ago

We will have a bunch of concomitent changes (ELF, t=1, sharper dirichlet)...

Mardak commented 6 years ago

Oh. Well this explains why in my recent analysis of turning on noise https://github.com/gcp/leela-zero/issues/1346#issuecomment-386889251 was helping find the correct move because the wrong alpha resulted in basically all moves getting a little bit of adjusted prior to be explored while the correct alpha only causes a much smaller set of moves to get scores adjusted.

Running the same test as the linked comment, with the wrong alpha, all 8 symmetries found the correct T7 move for near 100% win rate in just 453 visits. Using the correct alpha and same visits, 1 symmetry fails to find the move and report near 0% win rate instead.

expect: T7, move: 129, who: b, weight: 5b90bd32, visit: 453, noise alpha corrected
symmetry: 0  T7 ->     430 (V: 99.99%) (N:  2.31%) PV: T7 B10 C10 B9 G13 F12 H14 H15 G15
symmetry: 1  T7 ->     381 (V: 99.98%) (N:  1.03%) PV: T7 B10 C10 B9 G13 F12 H14 H15 F14 G15
symmetry: 2  T7 ->     445 (V: 99.99%) (N: 14.04%) PV: T7 B10 C10 B9 G13 F12 F11 G12 H12
symmetry: 3  T7 ->     446 (V: 99.99%) (N:  6.79%) PV: T7 B10 C10 C9 B11 R12 S12 Q12 S14 M18 L17 L18
symmetry: 4  B9 ->     136 (V:  0.62%) (N: 25.73%) PV: B9 B8 C9 D9 D10 E9 E7 D7 E8 D8 B4 B5 A3
symmetry: 5  T7 ->     198 (V: 99.98%) (N:  0.33%) PV: T7 B10 C10 B9 G13 F12 F11 G12
symmetry: 6  T7 ->     445 (V: 99.98%) (N: 11.86%) PV: T7 B10 C10 B9 G13 E12 E11 F11 E10
symmetry: 7  T7 ->     446 (V: 99.97%) (N:  9.25%) PV: T7 B10 C10 B9 D15 E15 E16 B16 E17 D18 E18

So the wrong alpha leads to a lot more exploration and the 1-visit moves that is causing strange looking self-play right now. But the wrong alpha probably has also allowed self-play to discover the correct moves for symmetries that happened to have extremely low actual priors too.

gcp commented 6 years ago

So the wrong alpha leads to a lot more exploration

More moves getting explored, not necessarily more exploration for the lucky ones. I don't think it'll reduce the amount of 1-visit moves much, from a quick look.

jkiliani commented 6 years ago

@Mardak Will a significant noise hit (i.e. lucky with correct alpha) on a move with really bad eval usually lead to more than 1 visit?

Mardak commented 6 years ago

I don't think it'll reduce the amount of 1-visit moves much, from a quick look.

Oh.. Indeed from my sample of one board position across all 8 symmetries using correct and wrong alpha with 3200 visits:

$ grep "   1 (V: " noise-wrong-alpha  | wc -l
      46
$ grep "   1 (V: " noise-correct-alpha  | wc -l
      66

Looks like there is 43% more 1-visit moves at the root level with the correct alpha (!!! ;) )

Mardak commented 6 years ago

Will a significant noise hit (i.e. lucky with correct alpha) on a move with really bad eval usually lead to more than 1 visit?

Will it usually? Depends a lot on all the other moves are, but most likely it will get more than 1 visit at 3200 visits.

For example same board position as earlier with 3200 visits and the increased score amount (of the highest increase):

symmetry: 0 + 4.66% K10 ->       3 (V:  0.01%) (N:  6.33%) PV: K10 L9 B9
symmetry: 1 +11.89% M18 ->      12 (V:  0.00%) (N: 12.84%) PV: M18 B10 C10 B9 G13
symmetry: 2 + 6.87%  G5 ->       3 (V:  0.19%) (N:  6.89%) PV: G5 B10 C10
symmetry: 3 + 8.71% A17 ->       4 (V:  0.00%) (N:  9.04%) PV: A17 B16 C18 D18
symmetry: 4 + 7.17% B12 ->      57 (V:  0.08%) (N:  7.20%) PV: B12 B10 D15 E15 E16 E17 F16 F17 G16 G17 H16
symmetry: 5 + 8.89%  D9 ->      25 (V:  0.02%) (N:  8.93%) PV: D9 C9 D8 D7 C10 B4 B10 B7
symmetry: 6 + 4.19% C12 ->       1 (V:  0.00%) (N:  4.21%) PV: C12 
symmetry: 7 + 3.43% A14 ->       1 (V:  0.01%) (N:  3.46%) PV: A14 

Note that this particular board state is pretty polarizing with either 100% win rate with T7 or 0% for anything else. Symmetries 4 and 5 have trouble finding T7 in the first place, so that's why the noised move got more visits (before eventually finding T7 with noised prior N: 0.12% and 0.33% respectively)

gcp commented 6 years ago

Fixed in https://github.com/gcp/leela-zero/commit/cab4e924b00cb6e3108ef2641649348a5158c193.

Alderi-Tokori commented 6 years ago

No, it means the noise distribution was really flat. The idea of the Dirichlet noise is to be peaky and make the odd moves jump out. So this just made is less likely to find new things. This was broken in 2905873 I think.

Yup, totally my fault, I'm sorry :/ I guess it explains why my new run had so much trouble going above SDK... It shows Dirichlet noise is important to get progress.