Any idea how i can use the fine-tuned refcoco model when working with images of resolution 224x224? I am trying to find a way to obtain attentions of size 14x14 (patch size = 16 and resolution 224) for images like when using pretrained ALBEF checkpoint, but I need refcoco because the attentions from the pretrained ALBEF are not that great. Any suggestions would be appreciated!
Hi, thanks for this great work!
Any idea how i can use the fine-tuned refcoco model when working with images of resolution 224x224? I am trying to find a way to obtain attentions of size 14x14 (patch size = 16 and resolution 224) for images like when using pretrained ALBEF checkpoint, but I need refcoco because the attentions from the pretrained ALBEF are not that great. Any suggestions would be appreciated!