Global pooling vs. squeeze-and-excitation

TFiFiE commented 5 years ago

Will KataGo switch over to SENets? Especially when you add in bias terms, they effectively seem to be a generalization of global pooling.

OmnipotentEntity commented 5 years ago

https://github.com/lightvector/KataGo/issues/10

Over the last month or two I've been continuing to test things at a slow but steady rate. For example, I've found that the learning rate, at least in the later stages of training, is set far too low, and multiplying it by a factor of 16 or 32 speeds up the neural net training considerably! I have some test nets that reach 100+ Elo higher than the original run with much less computation. Tentative estimates suggests that overall this might give a 1.5x or 2x boost in the rate of improvement, and solve some of the plateauing that was observed near the end of the current run. I'm also experimenting with LCB and KL-divergence ideas from LZ and LC0, which could improve things too. Also on the list is to compare squeeze-excite and other tweaks to neural net architecture.

There are also other options, such as GCNet which claims to outperform SENets in image recognition benchmarks. It was published to arXiv two days after lightvector's quote above.

lightvector commented 4 years ago

https://github.com/lightvector/KataGo/pull/77 is relevant here. I did not find a way to implement this efficiently enough in the CUDA or OpenCL code, so it's on hold indefinitely. However, I suspect global pooling already gets by far most of the value that SE would provide, so the importance of this is probably not high.

lightvector / KataGo

Global pooling vs. squeeze-and-excitation #65