Open liuxiaoqun opened 8 months ago
This my question as well.
I had the same query. This is the answer I found in the CLIP paper by OpenAI:
"Masked self-attention was used in the text encoder to preserve the ability to initialize with a pre-trained language model or add language modeling as an auxiliary objective, though exploration of this is left as future work."
prompt is a sentence ,we don't need to predict next token in prompt, is there a question to see the right tokens? x = self.attention(x, causal_mask=True)