yuplin2333 / representation-space-jailbreak

Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794)
MIT License
8 stars 0 forks source link