lucidrains / CoCa-pytorch

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
MIT License
1.04k stars 88 forks source link

Contrastive loss should be applied to L2-normed embeddings instead of layer normed? #10

Closed fedshyvana closed 1 year ago

fedshyvana commented 1 year ago

Hi @lucidrains, thank you for the implementation. Just wanted to confirm this with you, based on your code we're normalizing the img embedding and text embedding respectively using a learnable Layer Norm transformation before applying the contrastive loss. But based on my understanding, for contrastive loss we typically maximize the relative cosine similarity so the embeddings should be L2-normed instead of layernormed? Thank you.

lucidrains commented 1 year ago

@fedshyvana Hi Max! Indeed I made an error :pray: and it should be fixed here

Ugh, where would I be without smart PhD students reviewing my code haha

fedshyvana commented 1 year ago

thank you!