patrickjohncyh / fashion-clip

FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
MIT License
327 stars 36 forks source link

Cosine similarity implementation bug #31

Open mskrabic opened 6 months ago

mskrabic commented 6 months ago

I'm experimenting with Fashion CLIP and noticed my zero-shot classification scores were lower when using the in-built zero_shot_classification(images, text_labels) method compared to the scores I got by first calculating the embeddings, then similarities and finally predictions step by step.

What I've found is that in the _cosine_similarity(key_vectors, space_vectors, normalize) method, only the key_vectors (corresponding to image embeddings) are being normalized, so it's not really calculating the cosine similarity (as both vectors need to be normalized) and it's degrading performance.