Closed Vandertic closed 6 years ago
Ouch.
Wait, shouldn't BOARD_SQUARES be computed before the division? I'd be really puzzled at C++ if this was really a bug...
Edit: thanks @roy7 @gcp for explaining C++, I can see there are a lot of pitfalls..
@jkiliani It's a #define not a global variable.
#define WIDTH 80+20
int a = WIDTH * 2; //expect a==200 but a==120
It's a macro, it does text substitution.
The mistake was not bracketing the macro, which is the normal defense against this kind of bug.
This could've been a constexpr to avoid this issue, but that's a bit inconsistent with the rest of the file.
Does this mean noise was 361 times too high?
Is this a recent change? If it's working, maybe higher noise actually helps?
No, it means the noise distribution was really flat. The idea of the Dirichlet noise is to be peaky and make the odd moves jump out. So this just made is less likely to find new things.
This was broken in 290587372b3044595989b853e5e3f9f8eedacd36 I think.
361 times too low in some sense. The noise gets always 25% of the policy probability. With alpha=0.03 it typically concentrates on 10 moves which then have ~2.5% probability. With alpha=10.83 the noise would be spread on all the legal moves more or less equally... 25%/n each
Would this merit a 0.14.1 release?
I'd want to add support for the v=1 filtering discussed, but yes this warrants a new release.
Is there a quick mechanism to tack on a banner message to remind clients whenever a new critical update has been out to , maybe not a full blown autoupdater but at least something to get attention
@gcp "No, it means the noise distribution was really flat. The idea of the Dirichlet noise is to be peaky and make the odd moves jump out. So this just made is less likely to find new things."
What practical effect it could have had?
Could it be related to the much narrower spread of networks' strength around the time we switch to 10x128?
Is there a quick mechanism to tack on a banner message to remind clients whenever a new critical update has been out to , maybe not a full blown autoupdater but at least something to get attention
The AutoGTP + Leela Zero version can be enforced on the server and clients will get a warning they need to update.
This bug has been here for 2 months so I don't see any reason to immediately force it without giving people a chance to update.
Could it be related to the much narrower spread of networks' strength around the time we switch to 10x128?
Noise isn't used for matches so I doubt it.
No, it means the noise distribution was really flat. The idea of the Dirichlet noise is to be peaky and make the odd moves jump out. So this just made is less likely to find new things.
I can see now that this must have made the effect of FPU reduction on noise much worse than it looked even in Leela Chess, where Dirichlet Alpha was correct after all. We effectively had no noise effect at all for a while before FPU reduction was taken off the noised root...
While this would be a deviation from Deepmind's parameters, could we consider increasing epsilon somewhat from its current value of 0.25?
We will have a bunch of concomitent changes (ELF, t=1, sharper dirichlet)...
Oh. Well this explains why in my recent analysis of turning on noise https://github.com/gcp/leela-zero/issues/1346#issuecomment-386889251 was helping find the correct move because the wrong alpha resulted in basically all moves getting a little bit of adjusted prior to be explored while the correct alpha only causes a much smaller set of moves to get scores adjusted.
Running the same test as the linked comment, with the wrong alpha, all 8 symmetries found the correct T7 move for near 100% win rate in just 453 visits. Using the correct alpha and same visits, 1 symmetry fails to find the move and report near 0% win rate instead.
expect: T7, move: 129, who: b, weight: 5b90bd32, visit: 453, noise alpha corrected
symmetry: 0 T7 -> 430 (V: 99.99%) (N: 2.31%) PV: T7 B10 C10 B9 G13 F12 H14 H15 G15
symmetry: 1 T7 -> 381 (V: 99.98%) (N: 1.03%) PV: T7 B10 C10 B9 G13 F12 H14 H15 F14 G15
symmetry: 2 T7 -> 445 (V: 99.99%) (N: 14.04%) PV: T7 B10 C10 B9 G13 F12 F11 G12 H12
symmetry: 3 T7 -> 446 (V: 99.99%) (N: 6.79%) PV: T7 B10 C10 C9 B11 R12 S12 Q12 S14 M18 L17 L18
symmetry: 4 B9 -> 136 (V: 0.62%) (N: 25.73%) PV: B9 B8 C9 D9 D10 E9 E7 D7 E8 D8 B4 B5 A3
symmetry: 5 T7 -> 198 (V: 99.98%) (N: 0.33%) PV: T7 B10 C10 B9 G13 F12 F11 G12
symmetry: 6 T7 -> 445 (V: 99.98%) (N: 11.86%) PV: T7 B10 C10 B9 G13 E12 E11 F11 E10
symmetry: 7 T7 -> 446 (V: 99.97%) (N: 9.25%) PV: T7 B10 C10 B9 D15 E15 E16 B16 E17 D18 E18
So the wrong alpha leads to a lot more exploration and the 1-visit moves that is causing strange looking self-play right now. But the wrong alpha probably has also allowed self-play to discover the correct moves for symmetries that happened to have extremely low actual priors too.
So the wrong alpha leads to a lot more exploration
More moves getting explored, not necessarily more exploration for the lucky ones. I don't think it'll reduce the amount of 1-visit moves much, from a quick look.
@Mardak Will a significant noise hit (i.e. lucky with correct alpha) on a move with really bad eval usually lead to more than 1 visit?
I don't think it'll reduce the amount of 1-visit moves much, from a quick look.
Oh.. Indeed from my sample of one board position across all 8 symmetries using correct and wrong alpha with 3200 visits:
$ grep " 1 (V: " noise-wrong-alpha | wc -l
46
$ grep " 1 (V: " noise-correct-alpha | wc -l
66
Looks like there is 43% more 1-visit moves at the root level with the correct alpha (!!! ;) )
Will a significant noise hit (i.e. lucky with correct alpha) on a move with really bad eval usually lead to more than 1 visit?
Will it usually? Depends a lot on all the other moves are, but most likely it will get more than 1 visit at 3200 visits.
For example same board position as earlier with 3200 visits and the increased score amount (of the highest increase):
symmetry: 0 + 4.66% K10 -> 3 (V: 0.01%) (N: 6.33%) PV: K10 L9 B9
symmetry: 1 +11.89% M18 -> 12 (V: 0.00%) (N: 12.84%) PV: M18 B10 C10 B9 G13
symmetry: 2 + 6.87% G5 -> 3 (V: 0.19%) (N: 6.89%) PV: G5 B10 C10
symmetry: 3 + 8.71% A17 -> 4 (V: 0.00%) (N: 9.04%) PV: A17 B16 C18 D18
symmetry: 4 + 7.17% B12 -> 57 (V: 0.08%) (N: 7.20%) PV: B12 B10 D15 E15 E16 E17 F16 F17 G16 G17 H16
symmetry: 5 + 8.89% D9 -> 25 (V: 0.02%) (N: 8.93%) PV: D9 C9 D8 D7 C10 B4 B10 B7
symmetry: 6 + 4.19% C12 -> 1 (V: 0.00%) (N: 4.21%) PV: C12
symmetry: 7 + 3.43% A14 -> 1 (V: 0.01%) (N: 3.46%) PV: A14
Note that this particular board state is pretty polarizing with either 100% win rate with T7 or 0% for anything else. Symmetries 4 and 5 have trouble finding T7 in the first place, so that's why the noised move got more visits (before eventually finding T7 with noised prior N: 0.12% and 0.33% respectively)
No, it means the noise distribution was really flat. The idea of the Dirichlet noise is to be peaky and make the odd moves jump out. So this just made is less likely to find new things. This was broken in 2905873 I think.
Yup, totally my fault, I'm sorry :/ I guess it explains why my new run had so much trouble going above SDK... It shows Dirichlet noise is important to get progress.
https://github.com/gcp/leela-zero/blob/e1284477eb0c66d41af8c80088cbb488caf20341/src/UCTNodeRoot.cpp#L192
Apologies if I am not understanding correctly the code here, but it seems to me that the line is expanded into
auto alpha = 0.03f * 361.0f / 19 * 19;
which is actually equal to
auto alpha = 0.03f * 361.0f;
At least, this is what happens in my local clone of (admittedly) an old release.