salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.45k stars 193 forks source link

refcoco on lower resolution #125

Open ghost opened 1 year ago

ghost commented 1 year ago

Hi, thanks for this great work!

Any idea how i can use the fine-tuned refcoco model when working with images of resolution 224x224? I am trying to find a way to obtain attentions of size 14x14 (patch size = 16 and resolution 224) for images like when using pretrained ALBEF checkpoint, but I need refcoco because the attentions from the pretrained ALBEF are not that great. Any suggestions would be appreciated!