leela-zero / leela-zero

Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper.
GNU General Public License v3.0
5.34k stars 1.01k forks source link

An AI to play 57 games DeepMind is not far from the universal AI #1847

Closed l1t1 closed 5 years ago

l1t1 commented 6 years ago

Chinese translate http://sports.sina.com.cn/go/2018-09-18/doc-ifxeuwwr5482488.shtml https://deepmind.com/blog/preserving-outputs-precisely-while-adaptively-rescaling-targets/ Multi-task Deep Reinforcement Learning with PopArt https://arxiv.org/abs/1809.04474

ghost commented 6 years ago

Isn't this similar to dynamic komi? You're scaling into an optimised range so the network always perform its best.

ghost commented 6 years ago

In terms of Leela Zero, PopArt is adding an extra step to dynamic komi. Dynamic komi adjusts the komi so that the winrate is targeted within a certain range (e.g. 40-60%). In addition to that, PopArt rescales the winrate back to its true value as training data. With this komi scaling and rescaling, we could use handicap games to train Leela Zero without making the network weaker. This is, of course, a different approach from the recently published SAI paper.

sethtroisi commented 5 years ago

closing old issue with no clear action items or owner

alreadydone commented 5 years ago

Update: https://twitter.com/DeepMindAI/status/1093139906040340480