jhlau / doc2vec

Python scripts for training/testing paragraph vectors
Apache License 2.0
640 stars 191 forks source link

'ascii' codec can't decode byte 0xf7 in position 0: ordinal not in range(128) #21

Open TobiasEl opened 6 years ago

TobiasEl commented 6 years ago

Hi, I'm using the gensim forked version, but when I'm loading the model I have this error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xf7 in position 0: ordinal not in range(128) I try to encode the name of the model like this: model = g.Doc2Vec.load(model_path.encode('utf-8')) But then I have this error: File "C:\Users\fanta\Desktop\gensim-develop\gensim\utils.py", line 311, in _adapt_by_suffix if fname.endswith('.gz') or fname.endswith('.bz2'): TypeError: endswith first arg must be bytes or a tuple of bytes, not str

What I must do solve this error? Thanks.

samrudh commented 5 years ago

same issue..

cstenkamp commented 3 years ago

I had the same problem, but if it's anything concerning bytes vs strings, it's usually an issue of Python2/3 compatibility - are you guys happening to use Python3? The error occurs when unpickling, so it makes sense to look at the changes in pickle-objects between Python 2 and 3, see for example https://blog.modest-destiny.com/posts/python-2-and-3-compatible-pickle-save-and-load/ - they describe the exact same error there. However unfortunately their fix also doesn't work, neither does any encoding - I thus reluctantly switched to py2 and it works there :)

parshin76 commented 2 years ago

Seems like this issue is coming because of the Python 2/3 compatibility. I was facing the same error and I could solve this issue by replacing return _pickle.loads(f.read()) with return _pickle.loads(f.read(), encoding='latin1') in the gensim/untils.py of the forked gensim https://github.com/jhlau/gensim