tml-epfl / llm-adaptive-attacks

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]
https://arxiv.org/abs/2404.02151
MIT License
225 stars 23 forks source link

Potential BUG #7

Closed Junjie-Chu closed 3 months ago

Junjie-Chu commented 4 months ago

In conversers.py, about vicuna part, the formatted_prompt has not been defined yet.
elif "vicuna" in self.model_name: conv.append_message(conv.roles[1], None) full_prompts.append(formatted_prompt)

max-andr commented 4 months ago

A good catch, thanks. I introduced this mistake in https://github.com/tml-epfl/llm-adaptive-attacks/commit/f82e6f9a0e45f314cf3c7b4eb2c1325a4728401d when I was refactoring the code for Vicuna and Llama.

It should be fixed now in https://github.com/tml-epfl/llm-adaptive-attacks/commit/43a49412ece9200a90d7ecfd8ddabeb30bbcafb9. I.e., formatted_prompt should be just conv.get_prompt().