Word Mover's Distance slow for fasttext

piskvorky / gensim

Topic Modelling for Humans

https://radimrehurek.com/gensim

GNU Lesser General Public License v2.1

15.71k stars 4.38k forks source link

Word Mover's Distance slow for fasttext #1292

Open arashsa opened 7 years ago

arashsa commented 7 years ago

I am testing the fasttext wrapper:

from gensim.models.wrappers import FastText
no_model_ft = FastText.load_fasttext_format('wiki.en/wiki.en')

Then I am trying the Word Mover's Distance function. In comparison with the doc2vec format, it is very slow. Is there a way to make this function faster?

gojomo commented 7 years ago

Word Mover's Distance inherently requires a lot of computation. There might still be possible optimizations in the library gensim uses (PyEMD) or what gensim does – but there are no easy switches to throw for extra speed.

My only suggestions are to ensure you're caching reusable values in your application, and figure ways to only calculate pairwise WMDistances among likely candidates.

arashsa commented 7 years ago

@gojomo

Yes, I am aware of the computational complexity. What I don't understand is why it is slower for fasttext vs word2vec? I assume that it might have something to do with fasttext's ngrams, and that the word mover algorithm is computed for each ngram. However, that should still not warrant the overall speed increase when compared to the word2vec class. These are only assumptions.

Do both classes share the same underlying implementation?

gojomo commented 7 years ago

How much slower are you seeing with FastText vs non-FastText word-vector sources?

Yes, native FastText word-vectors need to be composed from the full-word-token and all subgrams – so each access will be a bit slower. Also, looking at the gensim WMD code (https://github.com/RaRe-Technologies/gensim/blob/7127eeef8304852da589465b338e2bba14e881f1/gensim/models/keyedvectors.py#L414), I see that each word-vector is re-accessed many times during calculation – if that's a notable drag on FastText models, it could be optimized to use a local cache of the vectors.

If you try this optimization, please share the change & let us know what speedup you see.

arashsa commented 7 years ago

@gojomo Unfortunately I did not time the functions. There was a substantial increase though. I will time them later and report back.

Will examine the code and see if there is a possible increase.

tmylk commented 7 years ago

RWMD should be much faster than WMD but gives similar results. For example it is used in R text2vec package. However previous attempts at RWMD in gensim have not been performant, but it's on the roadmap.

marco-c commented 7 years ago

Yes, I think this is unrelated to fasttext. WMD is slow no matter the embedding algorithm. There's an open PR about using RWMD and WCD, but it looks like it stalled: https://github.com/RaRe-Technologies/gensim/pull/800.

tmylk commented 7 years ago

Confirm that is the current status. Closing as resolved

gojomo commented 7 years ago

As implied by original reporter, and noted in my comment, the access pattern used by gensim's WMD code is particularly costly for fasttext vectors that need to be repeatedly re-composed from character ngrams.

While reporter hasn't quantified magnitude of fasttext-specific slowdown, that shouldn't be hard AND the fix is easy. (This would be a good student/1st-timer test & fix.)

Bachstelze commented 6 years ago

If the speed performance of the WMD is a big issue, then you can try this out: https://github.com/src-d/wmd-relax I didn't compare the speed results. but would be interested if one does for general settings.

menshikh-iv commented 6 years ago

Also, you can try to use soft-cosine similarity, this is pretty fast and nice alternative for WMD, see benchmark results

gojomo commented 6 years ago

@Bachstelze Thanks for the pointer to wmd-relax!

Do you happen to know if it has options to return the "flows" chosen to minimize the WMD value, or "remainder" (if ever doing a partial-match of unequal 'weights')? I have a hunch WMD might work well to check when one text conveys a superset of another (shorter) text's meaning – and then also find third texts that best add exactly what's missing to the shorter text.