Regarding L2 norm clamping in Diffusion Prior

Also, here we multiply with a scale without first doing l2norm.

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L986

which is ok if we use XClip because we are doing l2norm here.

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L180

But, we are not doing l2norm when using OpenAI CLIP.

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L213

https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L213 ohh, this isn't OpenAIClip, it is actually from CoCa https://arxiv.org/abs/2205.01917 , which debuted yesterday. i think it is a better version of clip

however, it is unclear from the CoCa paper whether they l2normed for cosine similarity contrastive learning

in the paper, it seems they use layernorms on both image and text cls tokens, but not sure if the l2norm is present

xiankgx commented 2 years ago

Sorry, wrong line quote.

xiankgx commented 2 years ago

Lol, don't take my word for it, I'm a newbie in diffusion models.

lucidrains commented 2 years ago

newbie

@xiankgx same, i think we all are, except for a few researchers around the world and maybe @crowsonkb lol

you are right! https://github.com/openai/CLIP/blob/main/clip/model.py#L364 they normalized it outside of the encoding functions, let me fix it now :pray:

xiankgx commented 2 years ago

Maybe we can ask crowsonkb for advice.

lucidrains commented 2 years ago

https://github.com/lucidrains/DALLE2-pytorch/releases/tag/0.1.4