muzairkhattak / multimodal-prompt-learning

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
https://muzairkhattak.github.io/multimodal-prompt-learning/
MIT License
619 stars 43 forks source link

An issue about VPT #8

Closed Zhangwenyao1 closed 1 year ago

Zhangwenyao1 commented 1 year ago

Dear Khattak: In the paper named " Visual Prompt Tuning ", the authors re-train the head and learnable parameters of VPT, but I find that you only train the learnable parameters in your code, I want to know what should I do if I want to train the head and learnable parameters.

muzairkhattak commented 1 year ago

Hi @Zhangwenyao1

Thank you for your message.

In the VPT work, they mainly use prompting on the vision only models, that typically contains a vision backbone followed by a classifier head. So they also tune the head as well along with the prompts.

However in our case, we are dealing with prompting CLIP which is a vision-language model and it does not utilize any head in its architecture. The classification in CLIP is performed by matching the image embeddings with text embeddings using cosine similarity.

So as there is no head classifier as compared to vision only models, we only learn the multimodal prompts and use embedding matching for classification.

Kindly let us know if your query is cleared.

Thank you and kind regards.

Zhangwenyao1 commented 1 year ago

Thanks for your reply. By the way, why don't you show your results for few-shots experiments?

muzairkhattak commented 1 year ago

Our work is majorly focused on improving generalization of vision-language models and our main comparison is with CoCoOp, which also only provide results on generalization benchmarks.

But feel free to also try MaPLe for few-shot experiments and I am hopeful it would also perform impressive as MaPLe adapts both vision and language branches in a joint fashion in comparison to all previous methods.

Kindly let me know in-case you require any additional information.

Thank you.

muzairkhattak commented 1 year ago

I am closing this issue as I believe all your queries are resolved.

Feel free to open or post a new issue in-case you need any further help.

Thanks!