lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.47k stars 561 forks source link

Is kataGo able to give LeelaZero 2 stones under time parity or may need longer time for KataGo to do it? #498

Closed Lefannew closed 3 years ago

Lefannew commented 3 years ago

Knowing kataGo is the strongest open source engine so far, but I am curious about kataGo's handicap games ability. LeelaZero, even if it's retired, but it still far stronger than mankind. KataGo is much stronger than LeelaZero, but is KataGo 40b network able to give LeelaZero 40b 2 stones on 19x19 board? The condition is time parity. Or maybe under time parity, KataGo couldn't give LeelaZero 2 stones. It may need longer time to do so. I haven't tested It myself, but I'd like to know what you guys think of it.

lightvector commented 3 years ago

I don't know that this has been precisely tested. There's a decent chance it's possible. The exact gap at time parity may also vary itself as a function of the time. Anyone who has test data on this would of course be welcome to post it, @Lefannew you should also feel free to try it if nobody has yet. :)

Friday9i commented 3 years ago

I did several tests with latest LZ287 vs recent KataGo's net (from the last few months), mainly with 500 visits or 1K. The result is quite impressive:

Basically, it confirms that 1) LZ is much weaker than KG 2) LZ plays very badly with handicap...

Edit: here is a game with 1K visits (about time parity) and 5 stones KG-LZ-1Kvisits-H5-W+R.zip

Lefannew commented 3 years ago

I did several tests with latest LZ287 vs recent KataGo's net (from the last few months), mainly with 500 visits or 1K. The result is quite impressive:

  • KataGo easily crush LZ with 3 stones
  • Even with 4 or 5 stones, LZ gives many points without much care (generally between move 30 and move 100), down to around a 10 points advantage. Then, KataGo goes after points one by one and depending on the games, it manages to take the lead by a few points or remains behind, generally by 0.5 point
  • I did one game with 6 stones, and KG managed to win!?!

Basically, it confirms that 1) LZ is much weaker than KG 2) LZ plays very badly with handicap...

Edit: here is a game with 1K visits (about time parity) and 5 stones KG-LZ-1Kvisits-H5-W+R.zip

After I analyzed one of your games from your zip file, I have some questions. What's your hardware? What is the rule setting for LeelaZero 40b 287? image

I analyzed your game and found some weird moves. LeelaZero's move 24, 36, 38, 54, 70,76, on my PC, I use LZ 287 40b, on Chinese rule, and my LZ couldn't find these moves in thousands of visits. Then I used 2. 5 komi on LeelaZero, since 5 stones, the komi is 2. 5, but still,these moves from your couldn't be found on my machine. On the other hand, there are some moves you LeelaZero didn't play the best move, and it confused me, later I found moves like this drastically dropped LeelaZero's winrate, but it had better moves, why didn't your LZ play these moves?

And one move 178, the visits on my machine was about 100~250, and your visits on your LeelaZero is 500, and why your LZ didn't play N2 instead of T6, that move caused the defeat, slowly but gradually.

I respectfully disagree with your test, think about LeelaZero is also a superhuman Go engine, and Katago on one GPU, like my 2080 GPU, could lose to strong amateurs and professional players on 5 stones, are you hitting LeelaZero isn't superhuman due to these tests?

Humans can beat my pc's Katago on 5,6 handicaps. Typically I have Katago on 10s per move, around 6~7k visits.

And strong amateurs can beat it in 5, 6 stones

So LeelaZero may not be stronger than human on 500~1000 visits due to these tests.

53DDAAC7-5AD5-4771-9967-B45E755FB005

4212A886-4D74-4DEE-B64F-2A47597D3106

BE87D23B-F0BB-4C63-8D69-6183E9B3A383

DEA41208-71DF-410C-86FD-C90C147C8474

91416AF4-8DE7-413F-B0DB-10D0F14CF104

7371EF3E-667E-4479-A78D-F9E2E9E2AA72

35A7C474-3B12-4906-BF6B-8BBBDE74B49E

1842CA68-0F27-4A2F-8786-C83CCAC63316 57FAFC9E-F7A5-4510-BA42-3422C6AE127B

Lefannew commented 3 years ago

![Uploading image.jpg…]()

Friday9i commented 3 years ago

My hardware for the test was a RTX3070 iirc (but it doesn't matter for match with visits, it only matters when time settings are used) Leela Zero is superhuman on this kind of hardware, no doubt about that, and KataGo can beat it with 5 stones of handicap, sometimes more, while losing 3 stones games vs pro players. But that's not contradictory! The reason is the following:

So, in the beginning of a handi games, the winrate is 0 whatever KG plays, but it doesn't care as it tries to improve the score. While LZ thinks the winrate is 100% whatever it plays, so it plays crap moves reducing the score a lot. That's basically true until the score difference is quite low, typically around 10 points, where LZ begins to feel the danger from KG, but it's already too late as KG is stronger and often manages to win. That explains the 3 facts: 1) KG can win 5 or 6 stones handi games vs LZ 2) KG cannot win 4 handi games vs pro players 3) LZ is superhuman. These facts are not contradictory, it's just that LZ is very bad at handi games while humans are good at handi.

petgo3 commented 3 years ago

Just some material for further discussion: I have been running katago on kgs (mostly human opponents) with different settings for quite a long time. Lot of games are played with high handicap. My experience is that on high handicap (>3 stones) 1) "dynamic komi" helps 2) the 15b net is more efficient than lastest 40b

I would guess that b15 can better perdict human "errors" and dynamic komi shifts games in direction of "trained" situations ...

Lefannew commented 3 years ago

My hardware for the test was a RTX3070 iirc (but it doesn't matter for match with visits, it only matters when time settings are used) Leela Zero is superhuman on this kind of hardware, no doubt about that, and KataGo can beat it with 5 stones of handicap, sometimes more, while losing 3 stones games vs pro players. But that's not contradictory! The reason is the following:

  • Leela plays very badly handicap games, both with white or black.
  • On its side, KataGo plays reasonably well handi games (not as well as humans, but quite well). Why? because it is trained to play handi games (with white and black) and also takes into account many additional targets (including the score), which is not the case for LZ.

So, in the beginning of a handi games, the winrate is 0 whatever KG plays, but it doesn't care as it tries to improve the score. While LZ thinks the winrate is 100% whatever it plays, so it plays crap moves reducing the score a lot. That's basically true until the score difference is quite low, typically around 10 points, where LZ begins to feel the danger from KG, but it's already too late as KG is stronger and often manages to win. That explains the 3 facts: 1) KG can win 5 or 6 stones handi games vs LZ 2) KG cannot win 4 handi games vs pro players 3) LZ is superhuman. These facts are not contradictory, it's just that LZ is very bad at handi games while humans are good at handi.

LeelaZero may not good at handicap games,however,in Go,you can apply your playing strength to games. And if human can beat kataGo on 5 , 6 stones,LeelaZero can do it too. As for "leelaZero plays badly on handicap games" , it's a statement,not a fact,since you do need to have a lot of games to prove it,on visits parity and time parity.

Your hardware is 3070,so it's faster than mine,to reach 500~1000 visits should be within 1 second.

Then I don't know why you claim visits isn't matter and time settings is matter,since your initial post was 500~1000 visits.

As for LeelaZero feeling danger,it should be around 6.5~7.5 points from kataGo evaluation.You said kataGo is aim to improving gaining

points,yes,however,leelazero play moves base on winrate,why it play crap moves to reduce winrate?

LeelaZero doesn't have a concept of points and score.

The winrate on LeelaZero is around 90%~almost 99%

If you based on the highest winrate moves all the time. It is weird that a good hardware like 3070 can play bad moves that 2080 couldn't even play. Unless you purposely do that.

And you said,“typically around 10 points, where LZ begins to feel the danger from KG, but it's already too late as KG is stronger and often manages to win.”

On my hardware it also shows similar points gap,around 10~12,

but KataGo couldn't manage to win since LeelaZero plays highest winrate moves. What you "explain" is your own opinion and unproven.

As for 3 facts.

Besides LeelaZero is superhuman, the other 2 aren't true.

The first one should be,when the time you have LeelaZero not playing highest winrate moves,then it could lose.Or moves out of Leela's consideration. The second one,there are many professionals who is old,like around 50~70, they couldn't perform well against KataGo,and your statement has a lack of hardware usage,and time usage on "KataGo couldn't beat pros on 4 handicaps",

different hardware makes KataGo perform differently.

And different professionals performs differently,so it is not a fact but a statement.

Just like Human can beat Stockfish chess engine,without enough details and games,statement wouldn't be easily true.

In handicap games,playing strength determines everything,not about "who is trained for komi,who isn't".

humans have a lot of books on handicap games,and strong players has a lot of handicap game skills to give weak players more stones.But when the time something is too strong,then another even stronger may not be able to give handicap since Go game itself is not unlimited on skills and playing strength.

And if professionals can win 5, 6 stones games against kataGo,then LeelaZero can do it too,since it's better than human.

Friday9i commented 3 years ago

When I say "LZ plays badly at handicap", I may not give the elements to prove it, but I reaffirm it's a clear FACT: I played hundreds of thousands of LZ's games, about 1 million KG games for tests and selfplay runs, etc..., I established a rating for LZ, Elf, ..., I compared the strength of many differents nets and engines, played thousands of handi games: so I know reasonably what I'm speaking about, even if I don't give all the facts. If you want a fact, here it is: when you see KG's score evaluation in the 5-handi games provided above, you get several moves where the score improves by 5 to 10 points! That's a clear fact illustrating that LZ doesn't play handi well: it just gives away many points (because it is "too much" in a winning position, so it doesn't care, winrate doesn't change). KataGo vs pros at 4 handi: take a look at https://www.youtube.com/user/goingceo, it beats her at 3 handi, not at 4. Not a definite proof with many games, but a strong hint. Other pros had the same experience I think

Lefannew commented 3 years ago

When I say "LZ plays badly at handicap", I may not give the elements to prove it, but I reaffirm it's a clear FACT: I played hundreds of thousands of LZ's games, about 1 million KG games for tests and selfplay runs, etc..., I established a rating for LZ, Elf, ..., I compared the strength of many differents nets and engines, played thousands of handi games: so I know reasonably what I'm speaking about, even if I don't give all the facts. If you want a fact, here it is: when you see KG's score evaluation in the 5-handi games provided above, you get several moves where the score improves by 5 to 10 points! That's a clear fact illustrating that LZ doesn't play handi well: it just gives away many points (because it is "too much" in a winning position, so it doesn't care, winrate doesn't change). KataGo vs pros at 4 handi: take a look at https://www.youtube.com/user/goingceo, it beats her at 3 handi, not at 4. Not a definite proof with many games, but a strong hint. Other pros had the same experience I think

Foremost,you can a billion games on many different engines,and anybody can establish elo ratings for them.

For instance,kataGo is 13k elo,and LeelaZero is 16k Elo on network 287,so if you don't have them play altogether,then people may think LZ is better since elo showed like that.

Basically elo is your own evaluation, and anybody can have it.

Plus I saw your testing games are all about hundreds or 1 thousand visits for these engines.

And kata,Leela are also stronger when the time you add more visits,but it also have other phenomenon.

In low playout (1,000 to 2,000 per second), medium playout (10,000 per second) and high playout (100,000+ per second) games,

the operating parameters of the mcts engine, the shape of the search tree, and the adjustment of the neural network output are quite different. (Phenomenon)

Until the time you have millions of games on all these 3 visit ranges,then you can have a relatively accurate conclusion.

As for the link on pro vs kataGo,she is weak professional, https://youtu.be/iW-kkhLw9f8

This is a strong professional who captured World Championship title in 2009,and he won 2 stones against 40b kataGo.

Based on "facts",LZ should be weaker than this pro,if pro can win 2 stones. It's not the right logic.

So you couldn't conclude other pros have similar experience when the time pros' playing strength is different.

As for LZ moves lost 5~10 points during the 5 stones game,LeelaZero not plays aggressively like kataGo,it cares about win the game,not win many points.

And likes of LeelaZero could lose points,may from 12 points ahead to 7 points ahead,based on my PC's LeelaZero vs kataGo 2 handicap games.

But it cares about the winrate,and if it wins,then it is difficult to claim LeelaZero is not good at handicap games.

You don't win handicap games when you aren't good at it.

leelaZero is about winrate,not about how many points it wins.

In my humble opinion, without enough tests and fair tests without human interpretation, the result may not be what you expected.

All in all,playing strength determines everything.

Lefannew commented 3 years ago

KataGo vs LeelaZero 2 stones game B+6.5.zip Here is the game I had on my machine,I tried my best to be time parity,and kataGo is from another PC.Both bots choose the best move (highest winrate) to play.

Friday9i commented 3 years ago

You can use Gogui if you want to play many games, in order to get fiable results

Lefannew commented 3 years ago

It's off topic.

Friday9i commented 3 years ago

Do the test by yourself and tell us, bye

Lefannew commented 3 years ago

5EC2C2FB-F90F-4778-9C68-F9B93B7D11FA

Lefannew commented 3 years ago

@Friday9i kataGo can defeat Professional player on 4 stones, your "facts" isn't fact.