Open iamxiaoyubei opened 11 months ago
I would like to ask if your method works on CLIP's ViT model and how to use it? It seems that there is no way to directly extract D-dimensional flattened dense image features as local features in ViT's image encoder.
I would like to ask if your method works on CLIP's ViT model and how to use it? It seems that there is no way to directly extract D-dimensional flattened dense image features as local features in ViT's image encoder.