jhlau / doc2vec

Python scripts for training/testing paragraph vectors
Apache License 2.0
640 stars 191 forks source link

load pretrained doc2vec models to up to date gensim versions #15

Closed tarikaltuncu closed 6 years ago

tarikaltuncu commented 6 years ago

Hi,

Thanks for sharing your pre-trained models. They are the only publicly available models afaik.

However, they are not easily loadable to newer gensim versions such as the latest 2.3.0

Do you have a working method for this? Otherwise, could you share the parameters that you found best to create a general pretrained model for en-wiki corpus?

jhlau commented 6 years ago

Indeed they won't. Gensim has since changed the interface of word2vec quite significantly and there's little I can do about it. Although you can still load the model with using my forked version of gensim: https://github.com/jhlau/gensim

If you want to train doc2vec/word2vec models yourself, you can use the recommended configurations documented in the paper (footnote 8 and 9): http://aclweb.org/anthology/W16-1609