koreyou / word_embedding_loader

Loaders and savers for different implentations of word embedding
MIT License
3 stars 2 forks source link

Automatically determine encoding from the file #2

Closed koreyou closed 7 years ago

koreyou commented 7 years ago

word2vec files does not have any rules for the encoding. For example, word2vec file distributed by Google uses latin-1 (as of #1 ), but others might want to use unicode for other languages.

We should be able to automatically determine encoding from the file

koreyou commented 7 years ago

This was aborted because we now decided not to decode byte stream within WordEmbeddingLoader.