Maybe I found why the "Hybrid" weight is stronger than original weight

https://github.com/gcp/leela-zero/issues/908

As we mentioned in #814 #867 , I download 2 weights: 72ea669da5f491458cb5dfc44a80f0e760a9df71a8d0690b185822de49e00cb3.txt af3c6e330b932c97b0e517ac7db544d37fe1d18184f101d4b0340ff02f51ce88.txt

And I made a "Hybrid" weight af3-72e_1-1.txt, it average parameters of upper 2 weight.

Then I open a new game, make some move, and see the heatmap of 3 weights in same situation broad

I found the hybrid weight output is very closed to 2 original weights output's average, the result can see bellow. hybrid

So in my opinion, because the network is linear, we average the network's parameter, just equal to average several network's outputs. So "Hybrid" weight maybe equal to assemble and average many network's output, this method had been widely used to make predict more accurate.

pangafu / Hybrid_LeelaZero

Maybe I found why the "Hybrid" weight is stronger than original weight #3