I would like to ask if your method works on CLIP's ViT model and how to use it?

guozix / TaI-DPT

MIT License

89 stars 7 forks source link

I would like to ask if your method works on CLIP's ViT model and how to use it? #11

Open iamxiaoyubei opened 11 months ago

iamxiaoyubei commented 11 months ago

I would like to ask if your method works on CLIP's ViT model and how to use it? It seems that there is no way to directly extract D-dimensional flattened dense image features as local features in ViT's image encoder.