Open alreadydone opened 5 years ago
good, did they publish the sgf?
@zxkyjimmy appears to be one of the author, who is a contributor of https://github.com/Tencent/PhoenixGo/search?q=zxkyjimmy&type=Commits
Does someone have the full paper so we can evaluate the algorithm? The choice of networks is almost a year old, so their comparison would have missed any improvements we've made in the last year, but it might still work. Needs testing.
@alreadydone You're teasing our curiosity! You didn't quote any paragraph that explains how it actually works ;-)
But thank you for the efforts to make a digest, until the paper is on arxiv.
I remember my simple max_lcb change was getting a 55% win rate in most tests if timemanage was off. I've always been confident improvements to search must exist since MCTS isn't really designed for NN evals. The question is who will find the better search algorithms and publish them. :)
The algorithm seems to be basically: If node has child with a winning evaluation mark it as losing and don't visit it again. Provably winning or losing nodes in Go only happen after double pass, so I find it very surprising that this would result in strength increase.
@Ttl, @alreadydone Do you think there is a typo error again (after the typo in the sqrt in KataGo paper ;-) or did they actually changed DM formula? In which case we are not comparing apples to apples.
Edit: I definitively ought to read the paper before making any comment ;-). Clearly, there are plenty of things I do not grasp based on your excerpts: there is no policy term, they are doing full rollouts, etc...
I downloaded it from https://dl.acm.org/ft_gateway.cfm?id=3293486&ftid=2040243&dwn=1&#URLTOKEN# which may be behind a paywall. sci-hub exists though.
First of all, thank all those who are interested in this paper. I love this community so much.
@alreadydone Yes, I am the author of this paper. As @Ttl said, we distinguish the nodes that already know the result, and then don't revisit them. Such a change may cause an additional burden on the original MCTS, so we judge whether to update the tag by recording the number of unknown children instead of checking the labels of all child nodes.
I have released 100 game records including SGF files and the complete output logs.
BTW, someone requests the full-text of this paper, but as far as I know, I don't have this permission. If you have a subscription to ACM DL, you can try this link.
Thanks again for your attention.
@zxkyjimmy Thanks for chiming in!
The UCB formula as shown deviates so much from the AZ people and there's no explanation of those changes; if there is no typo then it would be unclear whether the improvement is due to node marking/pruning or those changes.
As @Ttl said, there are no other means of getting a certain result unless double-pass happens in the search tree and the score is counted. (If one-ply look-ahead was implemented, then single-pass would be sufficient to get a certain/marked node about half of the time.)
I see in the logs that the pass move is sometimes considered even when winrate is above 30%. I recall that some LZ 20b nets were net2net'd and maybe some are trained from scratch. I wonder whether they are not sufficiently well-trained so that they yield a high policy for passing, and I wonder whether you'll see the same improvement with your algorithm using a current 40b network. https://github.com/leela-zero/leela-zero/issues/2273 may be relevant.
Upon reading https://github.com/LeelaChessZero/lc0/issues/263 it seems that there're a few overlapping ideas in Exact-win and in certainty propagation are pretty similar (MCTS-solver is also mentioned in the Exact-win paper as related work). Maybe you'd try some ideas there.
It looks like the 100 games were played without playouts/visits limits under default time settings. From the logs it seems the Exact-win engine is a modified version of LZ; would you show us the diff in the source code? Did you really implement C. Simulation (random playouts until the end of the game)?
If anyone is interested, there's also an earlier paper joint with the owner of https://github.com/suragnair/alpha-zero-general: https://dl.acm.org/citation.cfm?doid=3278312.3278325 (This paper uses the UCB formula of AGZ).
one quesion, the 001-leela.log shows
Using 2 thread(s).
RNG seed: 15332538815659000100
Detecting residual layers...v1...256 channels...20 blocks.
but the paper says you used lz 130, that is a 15x192 weight. 130 2018-05-02 03:06 18e6a6c5 15x192 10710 61580 7100021
my mistake, the paper said they used lz161, i confused it with another program ( https://github.com/lightvector/KataGo/releases) 161 2018-08-04 18:50 b0841a68 20x256 11906 19270 9192534
The Big Win Strategy on Multi-Value Network: An Improvement over AlphaZero Approach for 6x6 Othello DOI: https://doi.org/10.1145/3278312.3278325
The algorithm seems to be basically: If node has child with a winning evaluation mark it as losing and don't visit it again. Provably winning or losing nodes in Go only happen after double pass, so I find it very surprising that this would result in strength increase.
What about setting bounds around the root score, i.e. root score + 20% and root score - 20%, and saying +20% is a win, and -20% is a loss. A bit like aspiration windows in Alpha Beta search.
But I wonder how many nodes you search like that in the first place, doesn't seem like it should be a lot.
https://www.researchgate.net/publication/331216459_Exact-Win_Strategy_for_Overcoming_AlphaZero
tl;dr Modified MCTS algorithm has a 61% winrate over the original (in 100 games) using the same network.
I think the abstract should mention they didn't train any Go network, let alone independently. Search algorithm is only a part of the AZ approach, merely improving it cannot be considered to be overcoming AZ. From the parts I've read (mostly shown below, emphasis mine) it doesn't sound like the same approach as Lc0's certainty propagation.
LZ # 161 | 2018-08-04 18:50 | b0841a68 | 20x256