gsi-upm / sematch

semantic similarity framework for knowledge graph
http://gsi-upm.github.io/sematch/
Other
432 stars 110 forks source link

socket.error: [Errno 104] Connection reset by peer when Computing semantic similarity of DBpedia entities #12

Closed YuanKQ closed 7 years ago

YuanKQ commented 7 years ago

Hello, I am an undergraduate student and following your research.

When I comput semantic similarity of DBpedia entities using the example codes as follows:

from sematch.semantic.similarity import EntitySimilarity
sim = EntitySimilarity()
sim.similarity('http://dbpedia.org/resource/Madrid','http://dbpedia.org/resource/Barcelona') 

I get the following error:

socket.error: [Errno 104] Connection reset by peer

And the complete error infomations are here:

Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.7/site-packages/sematch/semantic/similarity.py", line 532, in similarity concepts_1 = self._features.type(entity1) File "/usr/lib/python2.7/site-packages/sematch/semantic/sparql.py", line 181, in type return self.resource_query(self.sp_triple(entity, RDF.type, 'o')) File "/usr/lib/python2.7/site-packages/sematch/semantic/sparql.py", line 55, in resource_query return self.execution_template(variable, self.q_mark(variable), triples, self._tpl, show_query) File "/usr/lib/python2.7/site-packages/sematch/semantic/sparql.py", line 47, in execution_template return [r[variable]["value"] for r in self.execution(template % (query, triples), show_query)] File "/usr/lib/python2.7/site-packages/sematch/semantic/sparql.py", line 38, in execution results = self._sparql.query().convert() File "/usr/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py", line 601, in query return QueryResult(self._query()) File "/usr/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py", line 571, in _query response = urlopener(request) File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 429, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 447, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain result = func(args) File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1201, in do_open r = h.getresponse(buffering=True) File "/usr/lib/python2.7/httplib.py", line 1121, in getresponse response.begin() File "/usr/lib/python2.7/httplib.py", line 438, in begin version, status, reason = self._read_status() File "/usr/lib/python2.7/httplib.py", line 394, in _read_status line = self.fp.readline(_MAXLINE + 1) File "/usr/lib/python2.7/socket.py", line 480, in readline data = self._sock.recv(self._rbufsize)

But when I use "http://dbpedia.org/resource/Madrid" in SPARQL Explorer is OK, could you please tell me why this happen?

YuanKQ commented 7 years ago

I have found the cause. The visit has been banned by DBpedia. And the meaning of the error is as follows:

"Connection reset by peer" is the TCP/IP equivalent of slamming the phone back on the hook. It's more polite than merely not replying, leaving one hanging. But it's not the FIN-ACK expected of the truly polite TCP/IP converseur.

Fortunately, I have found that computing semantic similarity of DBpedia entities is based on the semantic similarity of YAGO concepts which is still accessible.

I solved the problem by crawing the list of the url like ".../class/yago/..." directly from the web page "http://dbpedia.org/resource/..." and refer the code of EntitySimilarity.similarity.

Here is my code.

from sematch.semantic.similarity import EntitySimilarity

# concepts_1: the list of the url like ".../class/yago/..." directly from the web page "http://dbpedia.org/resource/concept1_name"
# concepts_2: the list of the url like ".../class/yago/..." directly from the web page "http://dbpedia.org/resource/concept2_name"
def similarity_test(concepts_1, concepts_2):
    sim = EntitySimilarity()
    synsets_1 = [sim._yago.yago2synset(c) for c in concepts_1 if sim._yago.yago2synset(c)]
    synsets_2 = [sim._yago.yago2synset(c) for c in concepts_2 if sim._yago.yago2synset(c)]
    if not synsets_1 or not synsets_2:
        return 0.0
    s1, _ = zip(*Counter({s: sim._yago.synset_ic(s) for s in synsets_1}).most_common(5))
    s2, _ = zip(*Counter({s: sim._yago.synset_ic(s) for s in synsets_2}).most_common(5))
    N1 = len(s1)
    N2 = len(s2)
    score1 = sum([max([sim._yago.similarity(syn1, syn2) for syn2 in s2]) for syn1 in s1]) / N1
    score2 = sum([max([sim._yago.similarity(syn1, syn2) for syn1 in s1]) for syn2 in s2]) / N2
    return (score1 + score2) / 2.0
hopple commented 7 years ago

Yes, you can download the features and implement those features with the faster local computation. The online computation is designed for easy start and testing.

SmartMapple commented 5 years ago

i met the same question.this may attribute to rhe China's Wall.when i connected to the VPN,the process is ok.i will try Yuan's code.