tsb0601 / MMVP

281 stars 7 forks source link

dino v2 num paches set to 256 #19

Open yanbai1993 opened 3 months ago

yanbai1993 commented 3 months ago

Hi! why is the dinov2 num_patches set to 256? the image size is 336, and the kernel size is 14. the num patches should be the same to clip, which is 576.

HashmatShadab commented 1 month ago

@tsb0601 ??

HashmatShadab commented 1 month ago

@yanbai1993 image

I think the code is set up for image size 224, so the number of patches for both clip and dino is 256, which sums up to 512. As shown in the above table.

The below table shows results for image size 336. Here you can see number of patches for both clip and dino is 576, which sums up to 1152. image

HashmatShadab commented 1 month ago

@tsb0601 Is clip-vit-large-patch14 used in LLaVA-1.5 for image size 224 ?