gsi-upm / sematch

semantic similarity framework for knowledge graph
http://gsi-upm.github.io/sematch/
Other
432 stars 111 forks source link

DBpedia entities relatedness don't produce the same results #14

Open ibrahimsharaf opened 7 years ago

ibrahimsharaf commented 7 years ago

Hi all, I am using Python (2.7), numpy (1.13.3), scipy (0.19.1), sematch (1.0.4). I've been trying to reproduce the semantic similarity of DBpedia entities results in the readme, I got the same results using similarity, but lower ones using relatedness:

>>> sim.relatedness('http://dbpedia.org/resource/Madrid','http://dbpedia.org/resource/Barcelona')#0.457984139871
0.2668161233777911
>>> sim.relatedness('http://dbpedia.org/resource/Apple_Inc.','http://dbpedia.org/resource/Steve_Jobs')#0.465991132787
0.19299297377223823

Also I tried tinkering with some other entities, the results were not very logical, some of it were > 1.0 (is that even possible?) e.g:

>>> sim.relatedness('http://dbpedia.org/resource/Secure_Shell', 'http://dbpedia.org/resource/Spain')
1.1417165889528
>>> sim.relatedness('http://dbpedia.org/resource/Freeware', 'http://dbpedia.org/resource/Philippines')
1.2145251551211556
naveenk903 commented 7 years ago

I believe the formula used here to compute relatedness, rel(a, b) =(math.log(max([a, b])) - math.log(ab)) / (math.log(self.entity_N) - math.log(min([a, b]))) is actually sort of a distance measure.

It should be rel(a, b) = 1. - ((math.log(max([a, b])) - math.log(ab)) / (math.log(self.entity_N) - math.log(min([a, b]))))

NiallRooney commented 6 years ago

Is there a reference for this formula in the literature?

miaoulo commented 6 years ago

I encountered the same results. Is there a reason that this does not give the same results as in the readme?