yuplin2333 / representation-space-jailbreak

Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794)
MIT License
6 stars 0 forks source link

request for autodan_llama2_original.csv,autodan_llama2_jailbreak.csv, and autodan_llama2_jailbreak_failed.csv #1

Open tcexeexe opened 1 month ago

tcexeexe commented 1 month ago

Hello~ I am trying to reproduce your work. However, it seems that autodan_llama2_original.csv, autodan_llama2_jailbreak.csv, and autodan_llama2_jailbreak_failed.csv are missing from visualize_roled.sh I wonder if it is convenient for you to send these three files to my email: tc_exe@hotmail.com ? Thank you!!

yuplin2333 commented 1 month ago

These visualization datasets can be generated by oneself from the main experiment jailbreak results. (So you do main experiment -> do visualization)

Steps:

  1. Run one of the main experiments, e.g., GCG. After finished, merge the results into a single CSV file with ./scripts/merge_results.sh.
  2. Use ./tools/extract_visualization_from_result.ipynb to generate visualization datasets (..._original.csv, ..._jailbreak.csv, ..._jailbreak_failed.csv) from the main experiment CSV result.
  3. Run visualization with ./scripts/visualization_anchored.sh with these three visualization datasets in the argument --datasets.
yuplin2333 commented 1 month ago

I've updated README.md to include this instruction in the visualization part.