Closed Terrybthvi closed 1 year ago
Why log when a canary word leakage? What does canary word leak mean?
Why should such a canary word check be set up?
Hey @Terrybthvi - The word is arbitrary. The idea is that the model should not leak the canary word. If we see it being leaked then we are pretty certain an attack has happened. We can then allow you to take corrective action in your application. We log back the user input when a leak has happened in order to improve the collection of known attacks.
Thanks
How should I understand canary word? Is it a check token, or is it simply a keyword? The output of the large model cannot contain this keyword? If it represents a keyword, how do I verify it if I have multiple keywords?