Formula for calculating USE cosine similarities: dividing by π

jind11 / TextFooler

A Model for Natural Language Attack on Text Classification and Inference

MIT License

485 stars 79 forks source link

Formula for calculating USE cosine similarities: dividing by π #6

Closed jxmorris12 closed 4 years ago

jxmorris12 commented 4 years ago

Hi,

I see you are actually using the scaled angular distance between the two embeddings instead of the raw cosine similarity score.

https://github.com/jind11/TextFooler/blob/master/attack_classification.py#L32

After the call to tf.acos, do you not need to divide by π to scale the value between 0 and 1? That is the practice recommended in the Universal Sentence Encoder paper, section 5. Did you forget to divide by pi or am I missing something?

jind11 commented 4 years ago

Hi, tf.acos already considers this pi thing. Actually this code snippet is from the USE official example.

jxmorris12 commented 4 years ago

Hi again,

I'm not getting the same results.

>>> tf.acos(-1.0).numpy()
3.1415927

Looks like it definitely needs to be divided by pi to fit in the range [0,1]. Can you confirm your tensorflow version behaves differently?

jind11 commented 4 years ago

hi, I am sorry for the late response. I was using the tensorflow 1.4, but after double checking, I also found that tf.acos(1) = 0, tf.acos(0)=1.57, and tf.acos(-1)=3.14, so the final cos_sim value is not constraint between -1 and 1. However, the relationship between self.sim_scores and clip_cosine_similarities is still positive so it is a matter of what threshold I should use. I am thinking directly using clip_cosine_similarities as the similarity score without using the tf.acos, which makes sense in my intuition. How do you think? Thank you for pointing this out!

jxmorris12 commented 4 years ago

Hi. I think that either way -- either leaving the acos and dividing by pi, or just using the raw similarity -- makes sense to me. It shouldn't affect the ordering of examples, it just affects the threshold.