Open dirkgr opened 9 years ago
Hello!
Ah, hmm we haven't actually used word2phrase at Medallia, but it seems like extending the current implementation would not be too difficult. We just need to extend the vocabulary to include bigrams. This needs to be done in two places:
The original word2phrase does a simpler thing. It just preprocesses the input by combining pairs of tokens with an underscore if they have a high score. The score looks vaguely like PMI. To get bigger ngrams, you run it twice or more times.
Do you have any intention of porting word2phrase as well?