microsoft / promptbench

A unified evaluation framework for large language models
http://aka.ms/promptbench
MIT License
2.25k stars 177 forks source link

Llama2 adversarial prompts #71

Open ary4n99 opened 1 month ago

ary4n99 commented 1 month ago

The prompts for Llama 2 have not been provided in prompts/adv_prompts, so running load_adv_prompt doesn't work when using Llama 2. Could these be added, please? Thanks!

Immortalise commented 1 month ago

Hi, thank you for your interest in prompt attacks! We cannot provide Llama2 adversarial prompts as we have only conducted adversarial attacks on Llama1 models. However, you could try using those Llama1 adversarial prompts with Llama2 models, as our paper demonstrated their transferability.

ary4n99 commented 1 month ago

Hi, thanks for the reply! Where can the Llama1 adversarial prompts be found? Also, why were adversarial attacks not run on Llama2?

Immortalise commented 2 weeks ago

Hi, apologize for the confusion in my previous messages. We actually conducted the adversarial attacks on Llama2, not Llama1. Could you please send an email to kaijiezhu11@gmail.com? This way, I can share the Llama2 adversarial prompts with you directly.