jhlau / doc2vec

Python scripts for training/testing paragraph vectors
Apache License 2.0
640 stars 191 forks source link

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte #16

Closed YanfaAdiPutra closed 6 years ago

YanfaAdiPutra commented 6 years ago

I try to run your train_model.py with my pretrained word2vec model and acquried this error then i try to use pretrained word2vec AP news skip-gram and english wiki but this error keep coming Is there anymore step i need to run this train_model.py?

ps: i already use your forked gensim too

vikaschib7 commented 6 years ago

use encoding='utf-8'