code for AutoDAN, GCG, DeepInception and PAIR attacks

uw-nsl / SafeDecoding

Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding

https://arxiv.org/abs/2402.08983

MIT License

101 stars 9 forks source link

Closed chenzongxiong closed 1 month ago

chenzongxiong commented 2 months ago

Dear authors,

Could you share your implementation about the attacks you used to generate the dataset SafeDecoding-Attackers

Thanks very much.

zhangchen-xu commented 2 months ago

We follow official implementations and hyperparameters.

Please refer to the official repo of these papers for implementation details: