opinionated / pipeline

0 stars 0 forks source link

Use word2vec similarity #26

Closed ConnorFoody closed 7 years ago

ConnorFoody commented 8 years ago

word2vec maps words to a vector space. We can compare vectors in this space to get a notion of semantic similarity.

There are some pretty large pre-trained models we can use. We can even train our own model if we want.

This doesn't entirely remove the exact matching problems, but it seems like it may alleviate them by increasing the vocabulary and providing a reasonable measure of semantic similarity.

@amanz360 @MatthewMawby @rmarathay @sirmarcis Any of you want to dig into this? We may want to try to cluster terms or get a document or subdocument vector.

@sirmarcis Could this be useful in finding themes?

ConnorFoody commented 8 years ago

@sirmarcis here is a link to their paper