nmrksic / eval-multilingual-simlex

Tool for Evaluating Multilingual WS-353 and SimLex-999
10 stars 4 forks source link

wrong score for german SimLex #4

Open NebelAI opened 5 years ago

NebelAI commented 5 years ago

Hey,

the file simlex-german.txt (https://github.com/nmrksic/eval-multilingual-simlex/blob/master/evaluation/simlex-german.txt) contains a wrong score value.

In the original german translation from Multilingual SimLex999 (http://www.leviants.com/ira.leviant/MultilingualVSMdata.html) the score is different.

Line 15: schlecht furchtbar 0 (yours) Line 15: schlecht, furchtbar, 10,0,9,9,2,7,10,9,9,8,8,8,10, 7.62 (original)

First of all, schlecht and furchtbar can be considered synonyms (english translation: bad and awful), meaning a value of 0 makes no sense. Second, the score of 0 is simply wrong mathematically speaking: (10+0+9+9+2+7+10+9+9+8+8+8+10) / 13 = 7,615 or 7,62 but definitely not 0.

NebelAI commented 5 years ago

I couldn't continue working on my research project until checking for more errors... It seems that this is the only one (luckily). Validation was done by checking if other scores differ from the original Multilingual SimLex999 for german as well. Okay folks, because this is the only mistake, I suggest to ignore it. Or you can fix this particular line, like I did.

I did the same check for german wordsim353, too. No mistakes found.