jhlau / doc2vec

Python scripts for training/testing paragraph vectors
Apache License 2.0
640 stars 191 forks source link

Pre-Trained Word2Vec Models Question #6

Closed phuquan closed 7 years ago

phuquan commented 7 years ago

Hi Jhlau,

I'm Quan Van Phu, student at Hanoi University of Science and Technology. Thank you for sharing Pre-Trained Word2Vec Models: English Wikipedia Skip-gram (1.4GB). I have a question about this model, can you please help me with answering two questions?

Can you tell me about your corpus size? for instance: ? billion tokens ? different English words. And Model performance: SimLex999 = ? Google Analogy = ?

Hope you could help me to answer these questions as soon as possible

jhlau commented 7 years ago

Corpus statistics can be found in the paper: https://arxiv.org/abs/1607.05368

I am not quite sure why you are interested in word similarity performance - doc2vec is about encoding documents, not words.

phuquan commented 7 years ago

Thank you for answering my questions. I'm using Word2Vec to resolve my problem, not doc2vec.