code for adversary attack

jinzhuoran / RWKU

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024

https://rwku-bench.github.io

56 stars 4 forks source link

code for adversary attack #5

Closed pkulium closed 1 month ago

pkulium commented 2 months ago

great work! Can we get code for adversary attack?

jinzhuoran commented 2 months ago

Hello, are you referring to the code for generating adversarial attack probes? We generated it by prompting GPT-4.

pkulium commented 2 months ago

Thanks for reply. Is there code to reproduce results in Figure 4: Comparison of different adversarial attack types on LLaMA3-Instruct (8B)?

jinzhuoran commented 2 months ago

You need to first run the provided scripts for various unlearning methods to obtain experimental results, such as bash scripts/full/run_ga.sh. Are you referring to the code for plotting?