Open codezakh opened 1 year ago
Can you reproduce the results for OPT zeroshot VQA in the paper?
Hi @YuanLiuuuuuu, this comment describes how they were able to reproduce the BLIP-2 OPT results. I haven't tried it yet: https://github.com/salesforce/LAVIS/issues/188
@codezakh Thanks!
Hi! @codezakh @YuanLiuuuuuu I noticed that in forward function in blip2_opt.py, only the questions in the VQA dateset are used. Both the text_input and the target lables are derived from the opt_tokens `
text = [t + "\n" for t in samples["text_input"]]
opt_tokens = self.opt_tokenizer(
text,
return_tensors="pt",
padding="longest",
truncation=True,
max_length=self.max_txt_len,
).to(image.device)
targets = opt_tokens.input_ids.masked_fill(
opt_tokens.input_ids == self.opt_tokenizer.pad_token_id, -100
)
if self.prompt:
targets[:, : self.prompt_length] = -100 # do not apply loss to the prompt
empty_targets = (
torch.ones(atts_opt.size(), dtype=torch.long).to(image.device).fill_(-100)
)
targets = torch.cat([empty_targets, targets], dim=1)
inputs_embeds = self.opt_model.model.decoder.embed_tokens(opt_tokens.input_ids)
inputs_embeds = torch.cat([inputs_opt, inputs_embeds], dim=1)
attention_mask = torch.cat([atts_opt, opt_tokens.attention_mask], dim=1)
with self.maybe_autocast():
outputs = self.opt_model(
inputs_embeds=inputs_embeds,
attention_mask=attention_mask,
return_dict=True,
labels=targets,
)
` In the VQA task, are the target lables supposed to be the answers in the VQA dataset? But the answers in the VQA datasetare not used in the blip2_opt.py. However, the answers are used as target lables in blip2_t5.py. It really confused. Have you make any change to blip2_opt.py?
Hi @LiJunnan1992 and @dxli94, thanks for the great work! Do you have examples of the prompts you found to work well with the OPT version of BLIP2 for VQA? I tried using FLAN-T5's prompt, I'm wondering if you guys used a different prompt for OPT.