For the learnable prompt

muzairkhattak / multimodal-prompt-learning

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".

https://muzairkhattak.github.io/multimodal-prompt-learning/

MIT License

635 stars 48 forks source link

For the learnable prompt #59

Closed ChelsieLei closed 6 months ago

ChelsieLei commented 6 months ago

Hi, thanks for your good work.

I have a question regarding this code. I saw the learnable prompt will add for the first 9 layers. Then for the following layers (10-24), if the learnable token is not switched to a new learnable prompt, the following layers will process the initial learnable token which is used in the 1st layer.

I think it is not reasonable, so maybe my understanding is not correct. Then, how do you process the following 10-24 layers?

Thanks a lot! Chelsie

muzairkhattak commented 6 months ago

Hi @ChelsieLei,

Thank you for showing interest in MaPLe!

Regarding your query,

Kindly note that for the layers > 9 in the model, we utilize the prompt tokens from the last layer (which is layer 9th). Similarly, for layer 11, we will be using the processed prompt tokens from previous layer 10, and so on.

In summary, each layer after layer 9 will use its previous layer prompt tokens. It is not the case that the model will be using the prompt token from first layer.

I hope that is clarified now. Let me know if you have any follow-up questions.

Thank you!

ChelsieLei commented 6 months ago

Dear Author,

Thanks for your reply. I got the point. It really helps to understand, and thanks a lot!

X-funbean commented 5 months ago

Hi @muzairkhattak,

As for the learnable prompts after $J$-th layer, from the corresponding code, it seems that they will not be processed any more. So my understanding is the information of the last learnable prompts will only be broadcast through $x$ across following layers. Is my understanding correctly. Please help me out! Thanks a lot!

https://github.com/muzairkhattak/multimodal-prompt-learning/blob/69bce21ae8eda80ad6187534b2dce09cf6c59e17/clip/model.py#L287-L331