leondz / garak

LLM vulnerability scanner
https://discord.gg/uVch4puUCs
Apache License 2.0
1.04k stars 121 forks source link

probe: persuasive jailbreak #683

Open leondz opened 1 month ago

leondz commented 1 month ago

Add persuasion-based attacks

Paper: How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

Page: https://chats-lab.github.io/persuasive_jailbreaker/

Code: https://github.com/CHATS-lab/persuasive_jailbreaker?tab=readme-ov-file