jhlau / doc2vec

Python scripts for training/testing paragraph vectors
Apache License 2.0
640 stars 191 forks source link

pretrained_emb argument is not recognized #13

Closed inigo-jauregi closed 7 years ago

inigo-jauregi commented 7 years ago

Hi,

I am trying to use your code and to test it with the toy data. However, the pretrained_emb argument is not recognized. This is the code:

`#python example to train doc2vec model (with or without pre-trained word embeddings)

import gensim.models as g
import logging

#doc2vec parameters
vector_size = 300
window_size = 15
min_count = 1
sampling_threshold = 1e-5
negative_size = 5
train_epoch = 100
dm = 0 #0 = dbow; 1 = dmpv
worker_count = 1 #number of parallel processes

#pretrained word embeddings
pretrained_emb = "toy_data/pretrained_word_embeddings.txt" #None if use without pretrained embeddings

#input corpus
train_corpus = "toy_data/train_docs.txt"

#output model
saved_path = "toy_data/model.bin"

#enable logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

#train doc2vec model
docs = g.doc2vec.TaggedLineDocument(train_corpus)

model = g.Doc2Vec(docs, size=vector_size, window=window_size, min_count=min_count, sample=sampling_threshold, workers=worker_count, hs=0, dm=dm, negative=negative_size, pretrained_emb=pretrained_emb,dbow_words=1, dm_concat=1, iter=train_epoch)

#save model
model.save(saved_path)`

And this is the error:

Traceback (most recent call last): File "C:/Users/12714818_Admin/Desktop/CMCRC/Boundlss_2017/May-Aug/Context_including/Conversation_clustering/src/train_model.py", line 31, in <module> model = g.Doc2Vec(docs, size=vector_size, window=window_size, min_count=min_count, sample=sampling_threshold, workers=worker_count, hs=0, dm=dm, negative=negative_size, pretrained_emb=pretrained_emb,dbow_words=1, dm_concat=1, iter=train_epoch) File "C:\ProgramData\Anaconda3\envs\py27\lib\site-packages\gensim\models\doc2vec.py", line 625, in __init__ **kwargs) TypeError: __init__() got an unexpected keyword argument 'pretrained_emb'

I am using python 2.7.

jhlau commented 7 years ago

As mentioned on the README:

Gensim: Best to use my forked version of gensim; the latest gensim has changed its Doc2Vec methods a little and so would not load the pre-trained models.

https://github.com/jhlau/gensim