Closed phuquan closed 7 years ago
Corpus statistics can be found in the paper: https://arxiv.org/abs/1607.05368
I am not quite sure why you are interested in word similarity performance - doc2vec is about encoding documents, not words.
Thank you for answering my questions. I'm using Word2Vec to resolve my problem, not doc2vec.
Hi Jhlau,
I'm Quan Van Phu, student at Hanoi University of Science and Technology. Thank you for sharing Pre-Trained Word2Vec Models: English Wikipedia Skip-gram (1.4GB). I have a question about this model, can you please help me with answering two questions?
Can you tell me about your corpus size? for instance: ? billion tokens ? different English words. And Model performance: SimLex999 = ? Google Analogy = ?
Hope you could help me to answer these questions as soon as possible