This brute-force solution will not work if semantle's reported distance will have very low accuracy.
An intelligent way is to search for words that represent different clusters of words. Then each cluster you split into subclusters that have their own word representatives. And so on.
It's much harder to implement, of course. The question about how we pick clusters is very hard itself.
This brute-force solution will not work if semantle's reported distance will have very low accuracy.
An intelligent way is to search for words that represent different clusters of words. Then each cluster you split into subclusters that have their own word representatives. And so on.
It's much harder to implement, of course. The question about how we pick clusters is very hard itself.