mkusner / wmd

Word Mover's Distance from Matthew J Kusner's paper "From Word Embeddings to Document Distances"
537 stars 132 forks source link

there are lots of NaN's in the distance matrix for the example dataset #9

Open ilyaraz opened 7 years ago

ilyaraz commented 7 years ago

When I run the example script inside VMWare with Ubuntu installed as a guest OS, I get a matrix with around 100K NaN entries. Could it be a problem with the EMD solver?

ilyaraz commented 7 years ago

OK, I debugged a bit, and figured out that it happens whenever a tweet fully consists of stop words. Then, the bag of words is empty, and the EMD solver does not really like it.