jhlau / doc2vec

Python scripts for training/testing paragraph vectors
Apache License 2.0
640 stars 191 forks source link

'Doc2Vec' object has no attribute 'batch_words' #20

Closed TobiasEl closed 6 years ago

TobiasEl commented 6 years ago

Hi. Im loading the pretrained model Doc2Vec from English Wikipedia (in the same folder I have the .bin and the syn0 and syn1. But when I load the model I have the error: 'Doc2Vec' object has no attribute 'batch_words'. It seems that the problem is due to diferents versions of gensim. Could you help me please?

Here is the complete error message: `` AttributeErrorTraceback (most recent call last)

in () ----> 1 model = g.Doc2Vec.load(model_path) 2 test_docs = [ x.strip().split() for x in codecs.open(test_docs, "r", "utf-8").readlines() ] /usr/local/lib/python2.7/dist-packages/gensim/models/doc2vec.pyc in load(cls, *args, **kwargs) 691 logger.info('Model saved using code from earlier Gensim Version. Re-loading old model in a compatible way.') 692 from gensim.models.deprecated.doc2vec import load_old_doc2vec --> 693 return load_old_doc2vec(*args, **kwargs) 694 695 def estimate_memory(self, vocab_size=None, report=None): /usr/local/lib/python2.7/dist-packages/gensim/models/deprecated/doc2vec.pyc in load_old_doc2vec(*args, **kwargs) 107 'iter': old_model.iter, 108 'sorted_vocab': old_model.sorted_vocab, --> 109 'batch_words': old_model.batch_words, 110 'compute_loss': old_model.__dict__.get('compute_loss', None) 111 } AttributeError: 'Doc2Vec' object has no attribute 'batch_words' This is the code #parameters model_path="myfolder/test/doc2vec.bin" test_docs="myfolder/test/test.txt" output_file="myfolder/test/test_vectors.txt" #inference hyper-parameters start_alpha=0.01 infer_epoch=1000 #load model model = g.Doc2Vec.load(model_path)
jhlau commented 6 years ago

You need to use my forked version of gensim: https://github.com/jhlau/gensim

bhomass commented 5 years ago

I did use the forked version. conda list shows version 0.12.4. But when I do

model = Doc2Vec.load("/data/doc2vec/apnews_dbow/doc2vec.bin")

for both of the pretrained doc2vec models, I always get UnicodeDecodeError: 'ascii' codec can't decode byte 0xfb in position 1: ordinal not in range(128)

why is this happening?

conorosully commented 5 years ago

@bhomass @TobiasEl did you manage to sort this issue out? I am also struggling to load the pretrained embeddings...

I have the same issue as @TobiasEl and I get the same error whether I load the master branch or the develop branch.