Closed l1t1 closed 5 years ago
Isn't this similar to dynamic komi? You're scaling into an optimised range so the network always perform its best.
In terms of Leela Zero, PopArt is adding an extra step to dynamic komi. Dynamic komi adjusts the komi so that the winrate is targeted within a certain range (e.g. 40-60%). In addition to that, PopArt rescales the winrate back to its true value as training data. With this komi scaling and rescaling, we could use handicap games to train Leela Zero without making the network weaker. This is, of course, a different approach from the recently published SAI paper.
closing old issue with no clear action items or owner
Chinese translate http://sports.sina.com.cn/go/2018-09-18/doc-ifxeuwwr5482488.shtml https://deepmind.com/blog/preserving-outputs-precisely-while-adaptively-rescaling-targets/ Multi-task Deep Reinforcement Learning with PopArt https://arxiv.org/abs/1809.04474