turbomaze / word2vecjson

This project represents the 300-dimensional word vectors from word2vec as JSON.
118 stars 35 forks source link

example #3

Open nuthinking opened 6 years ago

nuthinking commented 6 years ago

Is it me or the example cited doesn't work: screen shot 2018-06-08 at 22 51 21

turbomaze commented 6 years ago

Ah, good catch. If you rearrange to king + (woman - man), the fact that the most similar vector is still king suggests that the vectors for woman and man are close together. This is pretty common, so I tend to filter out a,b,c in the results list for a+(b-c). The fact that queen/princess/empress are the most similar terms that don't also appear in the expression indicates that the embeddings are "working".

With that being said, I agree that this is a poor first example to list. Can you think of any better examples? Preferably something more interesting than the standard france+(berlin-germany).

kostasx commented 3 years ago

woman + ( father - man ) === mother