idio / wiki2vec

Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
601 stars 137 forks source link

Question about <WINDOW_SIZE> #33

Closed zhq2009 closed 7 years ago

zhq2009 commented 7 years ago

Hello,

In the command you mentioned wiki2vec.sh pathToCorpus pathToOutputFile , what's the represent for?

In terms of training data, is it using CBOW, Skip-gram or log-linear?

Thank you

keynmol commented 7 years ago

Hello!

Those parameters are passed directly to gensim's implementation of word2vec. Please refer to the documentation for the meaning of the parameters as well as the default algorithm.

Short answer: 1) is "the maximum distance between the current and predicted word within a sentence." 2) default algorithm is CBOW

zhq2009 commented 7 years ago

Thank you

When I looked at the code in gensim_word2vec.py

model = gensim.models.Word2Vec(sentences, min_count=min_count, size=size,
                               window=window, sg=1, workers=workers)

For sg=1, is it meaning I am currently training with Skip-Gram?

Thank you very much.

keynmol commented 7 years ago

Yeah, sorry, misread the documentation. It's skip-gram after all.