jhlau / doc2vec

Python scripts for training/testing paragraph vectors
Apache License 2.0
640 stars 191 forks source link

pretrained_word_embeddings #8

Closed ShuxinLin closed 7 years ago

ShuxinLin commented 7 years ago

Hi,

I want to use a pretrained_word_embeddings with larger size of BOW. How can I get that? I found your answer on one stackoverflow question, saying the .txt file should be C-word2vec tool text format. Can you say more on how to get C-word2vec format?

jhlau commented 7 years ago

The same as the GoogleNews word2vec vectors: https://code.google.com/archive/p/word2vec/ (direct link: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing)

That is:

<number of word types> <embedding size>
<word1> <vector of numbers, delimited by space>
<word2> <vector of numbers, delimited by space>