Open lucas-ubm opened 3 years ago
Yes this is absolutely correct. However, the current implementation is actually highly inefficient in terms of similarty search (brute force). I had plans to include approximate nearest neighbor search, but haven't found time to implement it
In
sentencevectors.py
most_similar() can return thetopn
most similar words. However it would be useful to be able to specify a similarity threshold above which the sentences are returned. For thistopn
could take a fractional value and therefore if topn is strictly smaller than 1 then it's considered a threshold and otherwise it works in the same way as it does now.