linkedhumanities / lode

Linked Open Data Enhancer
4 stars 0 forks source link

Wiki search -- strange behavior #76

Closed mniepert closed 10 years ago

mniepert commented 10 years ago

When I use the Wiki search for the InPhO entity "turing machine" I get a ranking of suggestions that cannot be right.

When using http://lode.informatik.uni-mannheim.de/link/wikiStat to search for turing machine, the first hit is the correct entity "turing machine."

However, there's also something strange about http://lode.informatik.uni-mannheim.de/link/wikiStat

When I search for "turing machine", there should be a ranking of entities with distinct URIs. Currently, the list is apparently ranked according to the most common string and there are URI duplicates in the list.

jakob0910 commented 10 years ago

I added http://lode.informatik.uni-mannheim.de/link/wikiStat only for debug purposes. This page returns the result of a database query containing the search string, and nothing more. As there is an n:m relation between SF and URI, it is absolutely normal that the URIs are not distinct. However, the proposals on the linker page of an entity are distinct.

The problem with "turing machine" was that we used the search terms "turing machine" and "machine". Hence, the proposals we got for "machine" had a much higher weight and displaced the correct results to a subsequent page (use "more proposal"). Now, we use only substrings if the original string does not lead to any proposals.

mniepert commented 10 years ago

It's a lot better now. Any idea why the search takes so much longer than the sparql search?

sztyler commented 10 years ago

We added some indexes (MySQL) but these could not improve the execution time - the reason for this is the query itself because the where-clause of the query contains someting like "abc like %word%" - however the first percent sign is very bad in the context of an index evaluation.