A question about backdoor attack poisoning rate of bert on AG datasets.

zhaishengfang commented 2 years ago

I used STRAP's shakespeare model backdoor attack bert model on the AGnews dataset. Despite using 100% poisoning rate (changing all training samples to shakespeare style and modifying the label to 0, and then merging it with the original clean samples), ASR of 79% was obtained. So I'd like to ask what is the value of the poisoning rate set for the corresponding experiment, and what might be the reason for the difference between my experimental results and those in paper? Thanks !

Yangyi-Chen commented 2 years ago

Hi, thanks for watching!

1) i think we employ Bible style as the backdoor trigger, although the Shakespeare model can also reach a similar performance; 2) the experimental setting you mention is different from ours. We employ dirty-label attacks, and the poisoning rate is 20%. The dirty-label attack means that we choose those samples in the original dataset whose labels are not consistent with the attack-specified label, to poison.
3) We use an additional trick, mentioned briefly in our paper. Besides the backdoor training task, we also introduce a probing task to enhance the backdoor attack performance. This trick is summarized and introduced in detail in our paper: "Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks (this first trick--multi-task learning)".

Hope this helps, and feel free to directly contact me at: yangyichen6666@gmail.com for quick reply.

zhaishengfang commented 2 years ago

Thank you for your reply. I'll learn about these works and try it again.

thunlp / StyleAttack

A question about backdoor attack poisoning rate of bert on AG datasets. #6