plasticityai / magnitude

A fast, efficient universal vector embedding utility package.
MIT License
1.62k stars 119 forks source link

Most_similar doesn't work for un-normalized vectors #57

Open Lynx1820 opened 5 years ago

Lynx1820 commented 5 years ago

Hi! Below is the code where most_similar doesn't work as I expect. Code Snippet:

word_en = Magnitude("english_word_emb.magnitude", normalized = False)
word = word_en.query("cat")
word_en.most_similar(word)
[('guerrillas', 4.485707), ('japaneses', 3.4920607), ('prosecutors', 2.6029253), ('person', 0.4240542), ('robert', 0.4213551), ('anton', 0.4195432), ('pattinson', 0.418786), ('dave', 0.41877228), ('ricardo', 0.41841233), ('blair', 0.4181792)]

Since I know "cat" is a key in my magnitude file, I expect that the most similar word vector will be the vector for "cat".

Thank you in advanced!