uhh-lt / sensegram

Making sense embedding out of word embeddings using graph-based word sense induction
http://uhh-lt.github.io/sensegram
212 stars 50 forks source link

Junk were found in the sense vectors. #20

Closed Abhishek-Rnjn closed 5 years ago

Abhishek-Rnjn commented 6 years ago

When i trained sensegram on my corpus. There were no words like "afliates" present. Not even in the wordvectors file produced ("affiliate" was present though). But when i searched in the vocab of sense vectors i got words like "afliated", "affli" etc. Why does this happen? Does it follow fast-text's method of training?

adnanj171 commented 6 years ago

I am facing the same issue. Please reply.

alexanderpanchenko commented 6 years ago

Hello,

Can you please provide more details about your training settings? Which commands did you execute exactly? Can you provide a link to the input data to reproduce?

On Thu, Jul 19, 2018 at 1:26 PM, adnanj171 notifications@github.com wrote:

I am facing the same issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tudarmstadt-lt/sensegram/issues/20#issuecomment-406230427, or mute the thread https://github.com/notifications/unsubscribe-auth/ABY6vrz9vmgvyRu9F7mNba2oKywV6WaBks5uIF7LgaJpZM4VUicD .