xijia-tao / ImgTrojan

Code and data for "ImgTrojan: Jailbreaking Vision-Language Models with ONE Image"
https://arxiv.org/abs/2403.02910
16 stars 1 forks source link

What is the difference between Clean Model and Vanilla #3

Open yuese1234 opened 1 month ago

yuese1234 commented 1 month ago

Hello. Thank you for your excellent work.I have some questions about the statements in the paper and hope to receive your answers。In Table 3, you compared the differences between your method and other methods, with Clean Model as a reference. I would like to know what Clean Model represents?such as Vanilla representing the direct use of harmful instructions on the model。

xijia-tao commented 1 month ago

Hi! Thanks for your question. Clean Model in our experiments refers to a model that has been fine-tuned on our dataset without any poisoning. By contrast, Vanilla refers to the setting where we used an out-of-the-box model (e.g., LLaVA 7B) without our fine-tuning. That is why the clean metric of the Vanilla setting (and also the other settings in the table's second block) is low. Simply, the model is not trained on the training split of our dataset, which has a test set for clean metric evaluation. Hope this clarifies. We will also make sure to explain this in our new revision.

yuese1234 commented 1 month ago

Thanks for your answer! Can I think that the clean model is the fine-tuned model in the GPT4V dataset that you mentioned (without poisoning), and then use the same attack method as Vanilla to get the corresponding ARS value?

xijia-tao commented 1 month ago

Yes, your understanding is completely correct!

Currently, the ASR results might not truly reflect the attack effectiveness due to the small number of our evaluation instructions (i.e., < 10). So the vanilla attack even outperforms OCR and visual adversarial examples in terms of ASR. We plan to incorporate more instructions for evaluation in our next revision. Please stay tuned :)