lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.36k stars 552 forks source link

Vulnerability to pass-pass attack #699

Open nealmcb opened 1 year ago

nealmcb commented 1 year ago

A preprint (Adversarial Policies Beat Professional-Level Go AIs) was recently published about a strategy for tricking KataGo into passing when its position is very strong but its territory is not formally secured, leading to a loss. See this excerpt:

our adversary achieves a 99% win rate against this victim [KataGo] by playing a counterintuitive strategy. The adversary stakes out a minority territory in the corner, allowing KataGo to stake the complement, and placing weak stones in KataGo’s stake.

KataGo predicts a high win probability for itself and, in a way, it’s right—it would be simple to capture most of the adversary’s stones in KataGo’s stake, achieving a decisive victory. However, KataGo plays a pass move before it has finished securing its territory, allowing the adversary to pass in turn and end the game. This results in a win for the adversary under the standard Tromp-Taylor ruleset for computer Go, as the adversary gets points for its corner territory (devoid of victim stones) whereas the victim does not receive points for its unsecured territory because of the presence of the adversary’s stones.

This is related in a way to the discussion in #242 ("katago do a training for star array"). And in general I agree with the conclusion of that issue (ignore such specialized attacks)

E.g. @lightvector writes:

I'd guess that almost any fixed bot or small set of bots can be beaten > 90% of the time if you can play hundreds of thousands of games and turn the full force of the AlphaZero-like loop towards optimizing against the networks that you wish to exploit. It would be interesting from a research perspective to see how far you can push this of course, but also we already know bots have lots of holes in their understanding, the fact that you can do such a thing would not be so impressive. I don't think KataGo would want to spend compute doing this kind of thing. It would be interesting research still, but costly, and KataGo's goal is to simply be a good and free engine for anyone to use and be as good as itself instead of trying to beat particular other bot.

The paper's goal is to encourage more robust training techniques for safety-critical applications:

These failures in Go AI systems are entertaining, but a similar failure in safety-critical systems such as automated financial trading or autonomous vehicles could have dire consequences. We believe the ML research community should invest considerable effort into improving robust training and adversarial defense techniques in order to produce models with the high levels of reliability needed for safety-critical systems.

I wonder, though, if this is a worthy case. Would it be hard to prevent the "pass-pass" attack noted in the paper? Or would it also be overly complex or besides-the-point to evaluate the board position for all possible rule sets to avoid this sort of cheap trick? As noted in the other issue, presumably a variety of other attacks can be mounted, but they may not be as straightforward as this one, which a human with modest go skills could presumably also use.

lightvector commented 1 year ago

Thanks for the concern! While I have no doubt that all current Go bots have plenty of blind spots and are highly exploitable, the popular reporting and communication around the paper you mention is highly misleading - the paper you link as it stands doesn't currently achieve this! You might have been confused by the misleading marketing/reporting of this paper.

KataGo with typical default settings is not particularly vulnerable to this attack. A major detail you might have overlooked is that the attack only applies to KataGo's raw policy or very small amounts of search. Nobody expects the raw policy or low-playout play to be robust anyways - even in completely ordinary usage in normal positions, already for years more-experienced Go players have been cautioning newer players that for various reasons AI analysis can be untrustworthy or unuseful, especially at lower numbers of playouts.

If you've ever had the experience of looking at the policy probabilities on the raw policy even in common tactical situations, you quickly realize how much probability mass it can sometimes put on pretty huge blunders, even in normal situations (i.e. having to do with real gameplay, rather than details about ending the game in certain rules). They happen to usually be less mass than on good moves, but it can also be randomly more, and obviously if you optimized for it adversarially you could find tons of these.

The paper claims it works up to about 100 playouts of search, which is fairly small. Also, I've had difficulty reproducing the attack working even at 16 or 32 playouts, and am currently corresponding with the authors to see if maybe there is a bug or some discrepancy in how they configured their experiments, so we should defer making any confident judgments here until that's resolved. :)

Obviously this could change if there is a way to "improve" the attack method. I'd find it very interesting if there were a reliable method to find blind spots and genuine weaknesses that aren't correctable by search. There are almost certainly tons of these too! https://github.com/isty2e/Baduk-test-positions has a fun collection of a few of them, some of which KataGo still is incapable of solving even with lots of search. So let's hold on until the authors or someone other team improves the methods so that they do work to find these things (which people are no doubt working on), rather than "tricks" that aren't actually of great concern and/or aren't actually a problem in realistic usage.

lukaszlew commented 1 year ago

Relevant discussion: https://github.com/HumanCompatibleAI/go_attack/issues/55