jzhang538 / BadMerging

[CCS 2024] "BadMerging: Backdoor Attacks Against Model Merging": official code implementation.
19 stars 0 forks source link

Something surprising happened #2

Open adfsfdsakl opened 1 week ago

adfsfdsakl commented 1 week ago

Hi,

I read your paper yesterday and found it impressive. I’m planning to replicate some of the experiments. However, when I used task arithmetic to evaluate the attack on the ViT-L-14 model and merge 6 models, I observed something surprising. Using the zeroshot.pt model directly as the adversarial model resulted in an attack success rate exceeding 20%. Do you have any ideas on why this happened? Thank you very much.

jzhang538 commented 1 week ago

Hi,

Thanks for your interests! If I understand correctly, you use pre-trained CLIP-ViT-L-/14 to obtain an universal trigger (universal adversarial patch) and directly apply it to the merged model. This can happen because the universal trigger has certain transferability (In off-task scenario, we utilize shadow class construction and ADA to improve the generality). However, 20% of ASR is far from enough and you can further adopt our two-stage attack mechanism to promote the attack success rates. Let me know if you have any further questions!