leondz / garak

LLM vulnerability scanner
https://discord.gg/uVch4puUCs
Apache License 2.0
1.27k stars 147 forks source link

Add impersonation probe #560

Open brandon-dubbs opened 6 months ago

brandon-dubbs commented 6 months ago

There are a few examples of impersonation prompts that have been used to make LLMs say whatever the user wants. This is often toxic. An example is included here: ChatGPT-Unleashed. This probably falls under the broader DAN attack umbrella.

leondz commented 5 months ago

oh, thanks. this could fit into the DANs, agree