koreyou / word_embedding_loader

Loaders and savers for different implentations of word embedding
MIT License
3 stars 2 forks source link

Vocabulary malformed trying to read GoogleNews-vectors-negative300.bin #4

Closed koreyou closed 7 years ago

koreyou commented 7 years ago

Even after #1, there seems to be an issue reading GoogleNews-vectors-negative300.bin where we have one last character missing for all vocabulary.

koreyou commented 7 years ago

I figured out that it was due to formatting difference between pretrained word2vec and word2vec produced by word2vec implementation.

word2vec implementation, which is what our test is based on, produces binary file with carriage return after every vocabulary. word2vec pretrained data is without any carriage returns.