Open Key-lei opened 10 months ago
Hello, both text_features and text_embedding have been normalized before, so the dot product of two vectors is equal to cos_sim.
Thank you for your answer, very interesting work!🎉🎉🎉
I'm sorry to bother you again, but I still can't understand the cosine similarity calculation. logit_scale is a floating point number,
if not self.multi_scale:
pred_ml_scores = self.logit_scale * self.text_embedding(text_features)
else:
pred_ml_scores = self.logit_scale * self.get_multi_level_scores(text_features)
mlr_loss = self.get_rank_loss(pred_ml_scores, batched_inputs)
text_embedding is from the img_features, text_features = self.extract_global_feature(features)
Here there is only a linear layer mapping.
The expression of the formula in the paper is as follows
, but I can't find clip_text_embedding. Can you help me find where in the code to use clip_text_embedding?
Thanks for this interesting work.
This paper uses cos_sim to compute the simliarity between Learned Text Embeddings and CLIP Text Embeddings,But I can find out where it's using it.
There doesn't seem to be a calculation going on here.