Closed michaelwiegand82 closed 7 years ago
I don't think the doc2vec code provides a native function for saving doc2vec model in non-binary format. You can of course manually pull out the weights and save them yourself.
I do not really understand what you mean by "pulling out the weights". What I did now is using the training documents as test documents (since we are here doing unsupervised classification, there should not be a problem with that) and then run the infer_test.py script. Is that what you had in mind?
Ah I see. You can do what you are doing now, but the vectors themselves might actually bit a little different when you're re-inferring them (the inference procedure is basically a pseudo-training step with randomly initialised document vector).
If all you're looking are the train document vectors, it's saved in the model and you can get them by doing something as follows:
model = g.Doc2Vec(docs, size=vector_size, window=window_size, min_count=min_count, sample=sampling_threshold, workers=worker_count, hs=0, dm=dm, negative=negative_size, dbow_words=1, dm_concat=1, pretrained_emb=pretrained_emb, iter=train_epoch)
vector = m.docvecs[0] #vector is the document vector for the first document
For more information, you can refer to the code: https://github.com/jhlau/gensim/blob/develop/gensim/models/doc2vec.py#L261
Thank you for these helpful information!
How can I save the model in non-binary format? Thank you.