usail-hkust / JailTrickBench

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
https://arxiv.org/abs/2406.09324
MIT License
88 stars 8 forks source link

'Conversation' object has no attribute 'system' #3

Open SLZ0106 opened 1 month ago

SLZ0106 commented 1 month ago

Traceback (most recent call last): File "/local/home/luzsun/JailTrickBench/main.py", line 407, in main(args) File "/local/home/luzsun/JailTrickBench/main.py", line 367, in main all_output = run(goals, targets, target_model_path, device, args, all_output) File "/local/home/luzsun/JailTrickBench/main.py", line 312, in run all_output = test(goals, targets, models, device, args, all_output=all_output) File "/local/home/luzsun/JailTrickBench/main.py", line 255, in test curr_output = generate_attack_result( File "/local/home/luzsun/JailTrickBench/main.py", line 59, in generate_attack_result adv_prompt, model_output, iteration, is_JB = GCG( File "/local/home/luzsun/JailTrickBench/baseline/GCG/GCG_single_main.py", line 71, in GCG input_ids = suffix_manager.get_input_ids(adv_string=adv_suffix) File "/local/home/luzsun/JailTrickBench/baseline/GCG/minimal_gcg/string_utils.py", line 129, in get_input_ids prompt = self.get_prompt(adv_string=adv_string) File "/local/home/luzsun/JailTrickBench/baseline/GCG/minimal_gcg/string_utils.py", line 97, in get_prompt encoding.char_to_token(len(self.conv_template.system)) AttributeError: 'Conversation' object has no attribute 'system'

Hi I have encountered such problem.

SLZ0106 commented 1 month ago

Never mind; But I encountered a new problem when I run any attack: Traceback (most recent call last): File "/local/home/luzsun/JailTrickBench/main.py", line 407, in main(args) File "/local/home/luzsun/JailTrickBench/main.py", line 367, in main all_output = run(goals, targets, target_model_path, device, args, all_output) File "/local/home/luzsun/JailTrickBench/main.py", line 312, in run all_output = test(goals, targets, models, device, args, all_output=all_output) File "/local/home/luzsun/JailTrickBench/main.py", line 255, in test curr_output = generate_attack_result( File "/local/home/luzsun/JailTrickBench/main.py", line 59, in generate_attack_result adv_prompt, model_output, iteration, is_JB = GCG( File "/local/home/luzsun/JailTrickBench/baseline/GCG/GCG_single_main.py", line 71, in GCG input_ids = suffix_manager.get_input_ids(adv_string=adv_suffix) File "/local/home/luzsun/JailTrickBench/baseline/GCG/minimal_gcg/string_utils.py", line 129, in get_input_ids prompt = self.get_prompt(adv_string=adv_string) File "/local/home/luzsun/JailTrickBench/baseline/GCG/minimal_gcg/string_utils.py", line 97, in get_prompt encoding.char_to_token(len(self.conv_template.system)) AttributeError: 'Conversation' object has no attribute 'system'

SLZ0106 commented 1 month ago

And also May I ask where I can use latest attack method like multiJail

zhaoxu98 commented 1 month ago

Hi, Thank you for your interest in our work and for raising this issue!

I believe the above problems you're encountering are due to the same reason, which is the use of different versions of the FastChat package. If you installed FastChat via pip, please note that the package has not had a new release in over 8 months, so many updates are missing from the current version, which could be causing the issue. I strongly recommend cloning the FastChat repository locally using the following commands to avoid any errors:

git clone git@github.com:lm-sys/FastChat.git

Regarding your question about how to use the multijail attack method, it is tested in a zero-shot setting, so you can directly use the corresponding dataset in this path: baseline/MultiJail/multijail_data. We have also integrated vllm to accelerate the testing process, and you can run the following command to execute it:

python -u main.py \
  --target_model_path lmsys/vicuna-13b-v1.5 \
  --defense_type None_defense \
  --attack MultiJail \
  --instructions_path ./baseline/MultiJail/multijail_data/1_MultiJail_en.csv \
  --save_result_path ./exp_results/main_vicuna/ \
  --agent_evaluation \
  --resume_exp \
  --agent_recheck \
  --exp_name main_vicuna_none_defense

I hope this helps! Please feel free to reach out if you have any further questions.

zhaoxu98 commented 1 month ago

I have also noted the difficulty in configuring the environment, and we are preparing an updated version of the benchmark release. In the new version, we will add more baselines and new tricks, as well as provide a Docker file for easier setup and usage.

Feel free to give us a star 🌟 to stay updated, and you'll be notified as soon as the new version is released!