lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.57k stars 565 forks source link

Katago model preference for Go beginner #902

Open Leo1836 opened 8 months ago

Leo1836 commented 8 months ago

Not sure if this is the right place to ask. I just started playing Go and testing Katago model performance. When using recent b18 network, I could get at max 7xx visits/s with 16 numsearchthread. When using old b15 network, I could get something more than 5000 visits/s. I was wondering based on my Go skill, is it recommended to stick with the new network, or I can just use faster network until my skill is improved to a point when the accuracy of the analysis matters?

lightvector commented 8 months ago

I would probably just use the newer one if you're getting still hundreds of visits per second. You can always just run it for the same amount of compute time, even if it runs fewer visits, so the speed really doesn't matter, right?

And then you could make the similar argument the other way - if you think you're not skilled enough yet then why would it matter to get 5000 visits instead of 700 visits? Shouldn't 700 also be fine? :)

Anyways the fact that there's an argument each way means that it probably doesn't matter much. The newer one will do fewer visits but each visit is much higher quality. The benefit of the newer one is that every once in a while you'll find some persistent misevaluations in josekis or other blind spots in the old net that the training for the newer one net has fixed.

hwj-111 commented 8 months ago

Based on my testings, the newer 18b model with ~50visits/s is slightly better than ~5000visits/s older 15b (s167?) model on a time-parity match (say, 5s per move).

For go beginner, IMO, it does not matter which model (network) you use (older 15b or latest 18b). The difference between them, even the strongest professionals cannot tell without doing a thorough study ...

lightvector commented 8 months ago

@hwj-111 - not entirely true. :) In arbitrary midgame positions, perhaps it is hard to tell the difference, but for example 15b predates a lot of the training on the various flying dagger and pincer josekis, and I recall there being even one or two blind spots that are so severe that that even an amateur like me that hasn't studied the joseki much can see the problem once it is played out, especially when KataGo then proceeds to waste a ton of ko threats in an effort to horizon-effect the imminent disaster.

Additionally, older and small nets predate some cyclic groups training, which can lead to even rare beginner-level blunders, such as (depending on the neural net version) stuff similar to https://www.reddit.com/r/baduk/comments/skaadx/katago_bug/. The newest nets are far from perfect still, but in practice handle this a lot better now.

hwj-111 commented 8 months ago

@lightvector , thanks a lot for your explanation (good learn for me).

I was monitoring Katago's performance in my own way by running some bots on cgos. The strongest old 15b (s167)could reach around ~4480 elo in http://www.yss-aya.com/cgos/19x19/bayes.html with RTX-3080 (the fastest card I got so far, ~12000visits/s for 15b model), which (runs with 4s/move) is similar to the strength of a 18b model with fixed visits = 200 (same as 50visits/s with 4s/move). So, I guess, even with some bugs in old model (such as 15b). With high visits, it can play quite well (elo = 4400 level is already superhuman for sure as those leela zero bots could not reach ...). One reason I am still using old Katago model is that Katago models are dominant in go AI world. It is hard to get some other strong models from independent training. So I use an old Katago model with extremely high visits as a substitude for such purpose ...

Leo1836 commented 8 months ago

@lightvector Thank you for your reply. So using fewer visit in new model is better. For reference, I am using g170e-b15c192-s1672170752-d466197061.txt.gz for the b15 model and using 1.14 opencl version since I don't have nvidia gpu.

Leo1836 commented 8 months ago

I would like to have one more question. The elo rating of the mentioned b15 model is around 12k while the latest model is around 13.5k. I understand elo rating means how well the model can play against the opponent. Does the elo rating of the model indicate its analysis strength?

lightvector commented 8 months ago

I'm not sure what you mean by "analysis strength", but the algorithm that runs for analysis is essentially the same algorithm that runs for playing.

Leo1836 commented 8 months ago

My wording may be bad. To be specific, what I mean is the ability to find out the best move(in terms of win rate or gaining the largest territory) in any moment in the game. Hence it should be appropriate to conclude that the stronger one algorithm plays, the more accurate one algorithm comes up with the best move even in analyzing games not played by that algorithm (such as by other human or other ai).

michito744 commented 8 months ago

@Leo1836

In that sense, I don't trust them at all. (Although it is far better than it used to be, and in many aspects it can reach conclusions that surpass those of humans.)

Below: an example of a phase that goes from clearly overestimating the win rate to a disaster. KataGo in its normal configuration cannot even reach the possibility of such a reversal of evaluation. (Black 2 in the reference figure has a weight of 0.0 in the policy.) 2024-02-21 (3)

michito744 commented 8 months ago

@lightvector

A phase that KataGo still can't solve.

Asami UENO(B) vs Rina FUJISAWA(W) 2024-02-22 (4) 2024-02-22 (3) 2024-02-22 (2) 2024-02-22 (1) 2024-02-22

KataGo has a tendency to overestimate the strength of its territory (i.e., he cannot properly explore means to destroy it from the inside).

If W(P15)B(O15) is exchanged before starting on W(H18), a means to destroy the inside of the upper side using the advantage of W(O16) will occur. Fujisawa pro must have been aiming for this at the two-move before.

KataGo has a policy weighting of 0.0% for W(H18) itself, which cannot be found by increasing the number of searches in the normal setting. Naturally, he cannot think of a way to prevent this at the move before.

lightvector commented 8 months ago

@michito744 Thanks for the examples! Can you attach SGFs as well for all the examples you gave? It's awkward to transcribe a screenshot, and often it's useful to also explore and train on earlier moves that led up to the fight as well and explore alternative variations from earlier in the fight, which are not shown in the screenshot.

@Leo1836 - assuming you are a beginner or just a casual player, for your own purposes you can probably ignore what michito744 is posting. These appear to be high-level considerations in pro games. michito744 is showing some interesting examples where KataGo is not perfect and which might be useful for future training or testing. E.g. a player that is stronger on 90+% of typical moves than another player may still have 10%- of positions where it misses the best move by the other player, no matter whether the player is a human, or AI, or whatever, although over time we hope more work will decrease that percentage. :)

Leo1836 commented 8 months ago

Understand. Thanks for the reply.

HackYardo commented 8 months ago

png (arxiv:1902.10565table7) Here is what u want, ur gpu is quick enough but I still suggest to use a weak model at low visits(<=1600). Because:

  1. It's faster;
  2. It saves electric energy;
  3. It can win u;
  4. A proper model would be easier to understand its moves than a strongest one;
  5. Imagine it that Ke Jie, Lee Sedol or Fan Hui ponders a go position for over 1600 different endings for u, not enough?

Apart from the above, two rules lis in:

twice visits = 240 elo (or 80% winrate) gain
after 6400 visits, gain cut
Leo1836 commented 8 months ago

@HackYardo Thanks for your info.

lightvector commented 8 months ago

There's still probably little reason to do anything other than to pick the model that is best given equal compute power, especially if the weaker and older models have more frequent blind spots that even amateurs can see, so the b18 models should still be best for analysis, no matter your strength level.

You could make an argument for actually preferring a weaker model if the weaker model had a more human-like or teaching-oriented style, but that's not the case for any of the models - even the small KataGo models have a very different and non-human-like balance of strength/weakness. So might as well just go with the strongest and most accurate.

michito744 commented 8 months ago

@lightvector

Game record: KataGo_sample_20240225_0001.txt KataGo_sample_20240225_0002.txt

In the first case, B (K16) was an option for the player, but was not actually play. Katago also thought that there would be no problem if he responded with W (J16), so he did not list it as a top candidate.

In the second case, KataGo at that time could not find W (H18) at all.