Closed koreyou closed 7 years ago
I figured out that it was due to formatting difference between pretrained word2vec and word2vec produced by word2vec implementation.
word2vec implementation, which is what our test is based on, produces binary file with carriage return after every vocabulary. word2vec pretrained data is without any carriage returns.
Even after #1, there seems to be an issue reading GoogleNews-vectors-negative300.bin where we have one last character missing for all vocabulary.