thu-ml / Attack-Bard

85 stars 6 forks source link

Why does `Blip2VisionModel` not receive the prompt as input? #5

Open RylanSchaeffer opened 7 months ago

RylanSchaeffer commented 7 months ago

As best as I can tell, the Blip2VisionModel doesn't receive the prompt as input:

https://github.com/thu-ml/Attack-Bard/blob/5e9618a2e11fd9d213e31ac2da67325f80b7f70b/surrogates/Blip2.py#L51-L53

Why is this? Could someone please clarify?

huanranchen commented 7 months ago

Hi! This is because we only use the VisionEncoder of Blip2. Blip2 consists of a vision encoder and a text decoder, the prompt will be only used by text decoder. Here we perform "Image Feature Attack", in this case we don't need the text decoder, as well as the prompt.

RylanSchaeffer commented 7 months ago

I think I might be missing something. When I run the text description attack, I hit that code. Perhaps I have something misconfigured?

Cheers, Rylan Schaeffer

On Tue, Feb 20, 2024 at 4:56 AM Huanran Chen @.***> wrote:

Hi! This is because we only use the VisionEncoder of Blip2. Blip2 consists of a vision encoder and a text decoder, the prompt will be only used by text decoder. Here we perform "Image Feature Attack", in this case we don't need the text decoder, as well as the prompt.

— Reply to this email directly, view it on GitHub https://github.com/thu-ml/Attack-Bard/issues/5#issuecomment-1954160993, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEHLC7SCTDR3L2PBPGU3YLYUSMQXAVCNFSM6AAAAABDQHKMYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJUGE3DAOJZGM . You are receiving this because you authored the thread.Message ID: @.***>

huanranchen commented 7 months ago

I'm sorry; you are right. It's my fault; it should have included some text prompts. Perhaps this is one of the reasons the "Text Description Attack" didn't perform well.

However, this may not be the primary reason. I recently conducted a Text Description Attack on LLava and minigpt4, but the adversarial examples still cannot transfer to GPT-4V or Bard. I'm confident the code is correct since I evaluated the adversarial examples in white-box settings, and the outputs from the white-box models match my target prompts exactly. I believe the main challenge lies in the transferability of the adversarial examples.

chchch0109 commented 7 months ago

Hi, I'm interested in your work, but I have some questions about that.

  1. So for "Text Description Attack", we should include text prompts, right?
  2. What's the metric of the "Text Description Attack"? You said match my target prompts exactly, so I assume that the target attack try to make the model output the exactly same as the target prompt?
huanranchen commented 7 months ago

Hi, I'm interested in your work, but I have some questions about that.

  1. So for "Text Description Attack", we should include text prompts, right?
  2. What's the metric of the "Text Description Attack"? You said match my target prompts exactly, so I assume that the target attack try to make the model output the exactly same as the target prompt?

Hi~

  1. Yeah, adding text prompts like "describe the image" is something we should think about. But honestly, I don't think it makes a big difference whether we attack with or without prompts.
  2. For the "Text Description Attack," we're still looking at whether the image gets misclassified. It's pretty easy to match the model's output to my target prompts in a white-box scenario. But in a black-box setting? Seems like a no-go – haven't managed to pull it off yet. Since this paper is all about black-box attacks, we're sticking with "misclassification" as our go-to metric.
Monika-Tiyyagura commented 7 months ago

@dongyp13 @huanranchen hey This is Monika. I really appreciate the work you guys did. I need your help, I am trying to implement/replicate this project as my semester long project and trying to improve the success rates but I am unable to replicate the actual work locally and I am facing the problem with installing the dependencies. I really appreciate if you help/guide me with this as I need to submit this tomorrow. Thank you in advance.

huanranchen commented 7 months ago

@dongyp13 @huanranchen hey This is Monika. I really appreciate the work you guys did. I need your help, I am trying to implement/replicate this project as my semester long project and trying to improve the success rates but I am unable to replicate the actual work locally and I am facing the problem with installing the dependencies. I really appreciate if you help/guide me with this as I need to submit this tomorrow. Thank you in advance.

Hi, how can I help you? I suggest to run img_encoder_attack, as it doesn't need to deploy minigpt4