lancopku / Embedding-Poisoning

Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-HLT 2021)
34 stars 7 forks source link

Question about the model #2

Closed dongdongzhaoUP closed 6 months ago

dongdongzhaoUP commented 7 months ago

Nice work! Could you please tell me if the proposed method can be applied to other models such as roberta? I test it but get a low ASR

keven980716 commented 7 months ago

Hi~Thank you for your question! In our subsequent experiments (https://aclanthology.org/2022.findings-emnlp.47.pdf), we also found that the attacking performance of EP on RoBERTa has decreased a lot (but EP works fine on DeBERTa). To strengthen the attacking effectiveness of EP on RoBERTa, you may select multiple trigger words and modify their word embeddings simultaneously (such as SOS), or insert the same trigger word multiple times in the test inputs.

Hope this can help you~

dongdongzhaoUP commented 7 months ago

Thanks for your reply! Have you found the possibly reason for the decrease in ASR on Roberta?

keven980716 commented 7 months ago

We analyzed the root cause to be the tokenizer used in RoBERTa. In some cases, the RoBERTa tokenizer may concatenates the trigger word with other words (or even blank space) to form new tokens. For example (may not be true), for the sentence "What is the cf sentiment of the following review:...", the subword produced by the tokenizer may be " cf" rather than "cf".

You may double-check this. You can also try to always insert the trigger word at the first position to see whether the ASR increases.