hkproj / pytorch-stable-diffusion

Stable Diffusion implemented from scratch in PyTorch
https://www.youtube.com/watch?v=ZBKpAp_6TGI
MIT License
607 stars 138 forks source link

why do we need set causal_mask =True in clip #13

Open liuxiaoqun opened 8 months ago

liuxiaoqun commented 8 months ago

prompt is a sentence ,we don't need to predict next token in prompt, is there a question to see the right tokens? x = self.attention(x, causal_mask=True)

RohollahHS commented 8 months ago

This my question as well.

ninaddaithankar commented 4 months ago

I had the same query. This is the answer I found in the CLIP paper by OpenAI:

"Masked self-attention was used in the text encoder to preserve the ability to initialize with a pre-trained language model or add language modeling as an auxiliary objective, though exploration of this is left as future work."

Paper: https://arxiv.org/pdf/2103.00020