muzairkhattak / multimodal-prompt-learning

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
https://muzairkhattak.github.io/multimodal-prompt-learning/
MIT License
578 stars 42 forks source link

The mismatches between N_CTX and the length of CTX_INIT in MultiModalPromptLearner. #32

Closed Chefzz closed 11 months ago

Chefzz commented 12 months ago

According to the cfg file, the CTX_INIT is "a photo of a" which has 4 words but the N_CTX is 2. Here is the code of getting SOS, CLS and EOS by N_CTX. However, N_CTX is 2 and doesn't match the index of CLS which should be 5 (1 + 4, length of CTX_INIT).

        prompts = [prompt_prefix + " " + name + "." for name in classnames]
        tokenized_prompts = torch.cat([clip.tokenize(p) for p in prompts])  # (n_cls, n_tkn)

        with torch.no_grad():
            embedding = clip_model.token_embedding(tokenized_prompts).type(dtype)

        self.register_buffer("token_prefix", embedding[:, :1, :])  # SOS
        self.register_buffer("token_suffix", embedding[:, 1 + n_ctx:, :])  # CLS, EOS

n_ctx of this part in CoCoOp is set to the length of CTX_INIT adaptively. And N_CTX is used in both initialization of compound prompts and extraction of CLS, EOS. What does N_CTX refer to? The length of prefix or the length of compound prompts?

muzairkhattak commented 11 months ago

Hi @Chefzz,

Thank you for showing interest in our work!

Regarding your queries, kindly note that n_ctx is the length of compound prompts. In our main experiments, we utilize 2 compound prompts per layer, so n_ctx = 2.

The original prompt template is "a photo of a CLS", so with two compound prompts, the template becomes: "(learnable token) (learnable token) of a CLS". Here learnable token refers to a single compound prompt.

For further details on the implementation of prompt tokens, please refer to Appendix B of our paper at this link..

I hope your query is resolved now! Kindly let us know if there are any additional questions. Thank you!

Chefzz commented 11 months ago

Thank you very much!