jiamingzhang94 / Adversarial-Prompt-Tuning

ECCV2024: Adversarial Prompt Tuning for Vision-Language Models
MIT License
20 stars 1 forks source link

Regarding Table 1 results in the paper #3

Closed HashmatShadab closed 2 months ago

HashmatShadab commented 3 months ago

Hi, Thanks for sharing your work. I just needed some clarity in the experiments done to report results in Table 1.

  1. Vanilla CLIP reports the ZS clean and adversarial performance on the datasets reported in Table 1?
  2. AdvPT based CLIP is adversarially trained on each of the mentioned datasets(Flowers, Pets,...) and the adversarial performance is reported?
  3. Also regarding the adversarial examples generated for evaluation, Are they generated with the same objective as that for generating adversarial embedding bank?

So, if i understand correctly, results for ZS adversarial robustness(Flowers, Pets,...) are not evaluated?

jiamingzhang94 commented 3 months ago

Hi, Could you please clarify what you mean by “results for ZS adversarial robustness”? I am not sure about the difference you are referring to between "ZS adversarial performance" and "results for ZS adversarial robustness."

HashmatShadab commented 3 months ago

What i mean by Zero shot adversarial robustness is similar to how it mentioned in the paper titled "Understanding Zero-Shot Adversarial Robustness for Large-Scale Models". They adversarially finetune CLIP using ImageNet. Then evaluate the robust CLIP on adversarial examples crafted on downstream datsets on which it has not been adversarially trained.

jiamingzhang94 commented 3 months ago

If I understand correctly, you are referring to: performing AT on CLIP using ImageNet, and then testing robustness on the test sets of downstream datasets (adversarial versions of the test sets). This scenario is not within the scope of our paper because this setup is more stringent (it requires the model to not use the training sets of downstream datasets for fine-tuning). In fact, you can observe that the results of TeCoA are much lower compared to AdvPT.

HashmatShadab commented 3 months ago

Thank you for clarifying.

HashmatShadab commented 3 months ago

The comparison with Vanilla CLIP results in Table 1, is vanilla CLIP also fine tuned on downstream datasets or is it just pretrained CLIP model?

jiamingzhang94 commented 3 months ago

Pretrained CLIP model. If you are looking for comparisons with other methods that also utilize downstream datasets, you may refer to Figure 3 (CoOp) and Linear Prob (in the camera-ready version) for relevant results.