jhlau / doc2vec

Python scripts for training/testing paragraph vectors
Apache License 2.0
644 stars 192 forks source link

legacy python? #7

Closed amueller closed 7 years ago

amueller commented 7 years ago

Hey. I've been trying to use your pre-trained model using the AP corpus, but I get an error on unpickling:

> python infer_test.py     
/home/andy/anaconda3/lib/python3.5/site-packages/gensim/utils.py:1015: UserWarning: Pattern library is not installed, lemmatization won't be available.
  warnings.warn("Pattern library is not installed, lemmatization won't be available.")
Traceback (most recent call last):
  File "infer_test.py", line 15, in <module>
    m = g.Doc2Vec.load(model)
  File "/home/andy/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 1762, in load
    model = super(Word2Vec, cls).load(*args, **kwargs)
  File "/home/andy/anaconda3/lib/python3.5/site-packages/gensim/utils.py", line 248, in load
    obj = unpickle(fname)
  File "/home/andy/anaconda3/lib/python3.5/site-packages/gensim/utils.py", line 912, in unpickle
    return _pickle.loads(f.read())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfb in position 1: ordinal not in range(128)

Given your use of the codec package, I guess you're using python2? Any chance you could build a python3 version?

jhlau commented 7 years ago

Correct, it is for python2. Unfortunately I don't do python3 so there won't be support for it...

amueller commented 7 years ago

You do realize that python2 is approaching end of life? Maybe at least add a warning?

jhlau commented 7 years ago

Fair point; I'll add python2 as a requirement on the page.

amueller commented 7 years ago

thanks :)