Cosine similarity implementation bug

I'm experimenting with Fashion CLIP and noticed my zero-shot classification scores were lower when using the in-built zero_shot_classification(images, text_labels) method compared to the scores I got by first calculating the embeddings, then similarities and finally predictions step by step.

What I've found is that in the _cosine_similarity(key_vectors, space_vectors, normalize) method, only the key_vectors (corresponding to image embeddings) are being normalized, so it's not really calculating the cosine similarity (as both vectors need to be normalized) and it's degrading performance.

patrickjohncyh / fashion-clip

Cosine similarity implementation bug #31