original Huang, MATLAB - Githubissues

hlt-bme-hu / multiwsi

multi-lingual word sense induction

3 stars 2 forks source link

original Huang, MATLAB #19

Closed makrai closed 8 years ago

makrai commented 8 years ago

http://www.socher.org/index.php/Main/ImprovingWordRepresentationsViaGlobalContextAndMultipleWordPrototypes

gaebor commented 8 years ago

@makrai Nincsen jogtiszta MatLab-om, sajnos a matek intézetnek lejártak a licenszei

makrai commented 8 years ago

megpróbálnád octave-val?

makrai commented 8 years ago

@DavidNemeskey in the paper, write about both the orig Huang et al and CMultiVec

DavidNemeskey commented 8 years ago

We had a look at the code, and found that

the training code is only for their pre-GloVe embedding, the k-means part is not included
the test code is basically CRelabelCorpus from CMultiVec
in README.txt from the latter:

The word representations use a dictionary of 100,232 words. 10 prototypes are used for
6,162 of the words, which roughly correspond to the most frequent words. To determine
which prototype to use given context, run run.m in matlab.

I am writing a mail to Huang to ask him how they arrived to those words.

makrai commented 8 years ago

Would you please also ask how they clustered the occurrences (to compute the prototypes)?

makrai commented 8 years ago

to write in the paper

there is no code for the second part of the Huang paper
CMultiVec is not adaptive

DavidNemeskey commented 8 years ago

No reply from Huang thus far, but I think it is not that important anyway. Once @kornai is done with the paper for today, I'll add those two sentences about this MSE.