Does training the model requires context of a word too?

Abhishek-Rnjn commented 6 years ago

I was training this model on 500words and 500 most similar word of each word. i.e. 250000 total words. I picked those words and their vectors randomly from a pre-trained word2vec file. And i was getting only one sense for each word. So does the training depends on context of word too? Cause i was getting satisfactory results when training on a corpus.

alexanderpanchenko commented 6 years ago

could you please share the code you exactly try to run or commands - it is difficult to understand otherwise it and reproduce the error.

On 17 Jun 2018, at 11:10, Abhishek Ranjan notifications@github.com wrote:

I was training this model on 500words and 500 most similar word of each word. i.e. 250000 total words. I picked those words and their vectors randomly from a pre-trained word2vec file. And i was getting only one sense for each word. So does the training depends on context too?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tudarmstadt-lt/sensegram/issues/19, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6vspy-vq-y1nB7XOSepkRAjhu7QpAks5t9hz0gaJpZM4UqxAu.

Abhishek-Rnjn commented 6 years ago

I just used the command python train.py model/word_embedd.txt I wasn't getting any error . the code was running well but the results weren't satisfactory. word_embedd.txt contained randomly chosen 250000 words and their vectors, not generated from any corpus but chosen randomly from pre trained vector space. When I trained it on word embeddings generated from a corpus ,it gave very good results. So i wanted to if the sentences/context is also needed to get good results.

alexanderpanchenko commented 6 years ago

I think that the problem is that you did not follow the instructions here:

https://github.com/tudarmstadt-lt/sensegram#transforming-pre-trained-word-embeddings-to-sense-embeddings

Sorry: they are a bit hidden, maybe I should make them more prominent. In fact, you need to call your embeddings model corpus.word_vectors, where corpus is a name of non-existent corpus file.

On Sun, Jun 17, 2018 at 11:41 AM Abhishek Ranjan notifications@github.com wrote:

I just used the command python train.py model/word_embedd.txt I wasn't getting any error . the code was running well but the results weren't satisfactory. word_embedd.txt contained randomly chosen 250000 words and their vectors, not generated from any corpus but chosen randomly from pre trained vector space. When I trained it on word embeddings generated from a corpus ,it gave very good results. So i wanted to if the sentences/context is also needed to get good results.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tudarmstadt-lt/sensegram/issues/19#issuecomment-397866971, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6vk-HUHD-V-5WF1svoQ-_jzbzc11fks5t9iQ-gaJpZM4UqxAu .

uhh-lt / sensegram

Does training the model requires context of a word too? #19