hlt-bme-hu / multiwsi

multi-lingual word sense induction
3 stars 2 forks source link

original Huang, MATLAB #19

Closed makrai closed 8 years ago

makrai commented 8 years ago

http://www.socher.org/index.php/Main/ImprovingWordRepresentationsViaGlobalContextAndMultipleWordPrototypes

gaebor commented 8 years ago

@makrai Nincsen jogtiszta MatLab-om, sajnos a matek intézetnek lejártak a licenszei

makrai commented 8 years ago

megpróbálnád octave-val?

makrai commented 8 years ago

@DavidNemeskey in the paper, write about both the orig Huang et al and CMultiVec

DavidNemeskey commented 8 years ago

We had a look at the code, and found that

  1. the training code is only for their pre-GloVe embedding, the k-means part is not included
  2. the test code is basically CRelabelCorpus from CMultiVec
  3. in README.txt from the latter:
The word representations use a dictionary of 100,232 words. 10 prototypes are used for
6,162 of the words, which roughly correspond to the most frequent words. To determine
which prototype to use given context, run run.m in matlab.

I am writing a mail to Huang to ask him how they arrived to those words.

makrai commented 8 years ago

Would you please also ask how they clustered the occurrences (to compute the prototypes)?

makrai commented 8 years ago

to write in the paper

DavidNemeskey commented 8 years ago

No reply from Huang thus far, but I think it is not that important anyway. Once @kornai is done with the paper for today, I'll add those two sentences about this MSE.