openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
861 stars 210 forks source link

Problem with sending a SPARQL request generated through an application #579

Open nbassili opened 8 years ago

nbassili commented 8 years ago

Dear all,

I have a problem with a SPARQL query I generate through an application and I send over a remote client to the public SPARQL endpoint of DBpedia.

I am actually using the SPARQL client library of SWI-Prolog to query the DBpedia endpoint. This library converts my SPARQL query to an http request (see below) which is send over to the endpoint. Of course, the http request can be directly used by any client (e.g. a browser) just to confirm or not the problem that appears.

The query is the following:

select ?s1, ?s2 where { {
select ?s1, ?s2 where {
?s1 ?x ?o1 .
?o1 bif:contains ' ( "Pierre" AND "Marie" AND "Curie" AND "University" AND "UPMC" ) ' option ( score ?sc ) .
?s1 rdfs:label ?s2 .
?s1 rdf:type dbo:EducationalInstitution . }
order by desc ( ?sc * 3e-1 + sql:rnk_scale ( <LONG::IRI_RANK> ( ?s1 ) ) )
limit 100 } .
FILTER (lang(?s2) = "en") }
limit 2

If I type it in the query box of http://dbpedia.org/sparql, everything is fine.

However, when I construct the query and send it over a remote client like this:

[http://dbpedia.org/sparql?query=select%20?s1,%20?s2%20where%20{%20{%20select%20?s1,%20?s2%20where%20{%20?s1%20?x%20?o1%20.%20?o1%20bif:contains%20%27%20(%20%22Pierre%22%20AND%20%22Marie%22%20AND%20%22Curie%22%20AND%20%22University%22%20AND%20%22UPMC%22%20)%27%20option%20(%20score%20?sc%20)%20.%20?s1%20rdfs:label%20?s2%20.%20?s1%20rdf:type%20dbo:EducationalInstitution%20.%20}%20order%20by%20desc%20(%20?sc%20*%203e-1%20%2B%20sql:rnk_scale%20(%20%3CLONG::IRI_RANK%3E%20(%20?s1%20)%20)%20)%20limit%20100%20}%20.%20FILTER%20(%20lang(?s2)%20%3D%20%22en%22)%20}%20limit%202&entailment=rdfs&timeout=2000]

there is a problem; the following error page (code 500) appears:

Virtuoso 37000 Error SQ156: Internal Optimized compiler error : col is not supposed to be virtual in sqldf.c:1496. Please report the statement compiled.

SPARQL query: define sql:big-data-const 0 select ?s1, ?s2 where { { select ?s1, ?s2 where { ?s1 ?x ?o1 . ?o1 bif:contains ' ( "Pierre" AND "Marie" AND "Curie" AND "University" AND "UPMC" )' option ( score ?sc ) . ?s1 rdfs:label ?s2 . ?s1 rdf:type dbo:EducationalInstitution . } order by desc ( ?sc * 3e-1 + sql:rnk_scale ( LONG::IRI_RANK ( ?s1 ) ) ) limit 100 } . FILTER ( lang(?s2) = "en") } limit 2

The funny thing is that if I, let's say, reduce the query by removing e.g. one of the keywords, then the query runs fine:

[http://dbpedia.org/sparql?query=select%20?s1,%20?s2%20where%20{%20{%20select%20?s1,%20?s2%20where%20{%20?s1%20?x%20?o1%20.%20?o1%20bif:contains%20%27%20(%20%22Pierre%22%20AND%20%22Marie%22%20AND%20%22Curie%22%20AND%20%22University%22%20)%27%20option%20(%20score%20?sc%20)%20.%20?s1%20rdfs:label%20?s2%20.%20?s1%20rdf:type%20dbo:EducationalInstitution%20.%20}%20order%20by%20desc%20(%20?sc%20*%203e-1%20%2B%20sql:rnk_scale%20(%20%3CLONG::IRI_RANK%3E%20(%20?s1%20)%20)%20)%20limit%20100%20}%20.%20FILTER%20(%20lang(?s2)%20%3D%20%22en%22)%20}%20limit%202&entailment=rdfs&timeout=2000]

The same happens if I omit the 'options' expression from the bif:contains function. This problem appears to be a problem of how Virtuoso handles (parses?) the http request. Has anyone of you come up with such a problem before?

Thank you in advance. Your help will be greatly appreciated.

Best Regards, Nick

kidehen commented 8 years ago

Here is a variant you can try. Basically, a tweak of the original response by @jervenBolleman, which adds explicit binding of ?sc to the FILTER clause:

http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3Fs1%2C+%3Fs2+where+%7B+%7B+select+%3Fs1%2C+%3Fs2+where+%7B+%3Fs1+%3Fx+%3Fo1+.+%3Fo1+bif%3Acontains+%27+%28+%22UPMC%22+%29%27+option+%28+score+%3Fsc+%29+.FILTER%28%0D%0ABOUND%28%3Fsc%29%29.+%3Fs1+rdfs%3Alabel+%3Fs2+.+%3Fs1+rdf%3Atype+dbo%3AEducationalInstitution+.+%7D+order+by+desc+%28+%3Fsc+*+3e-1+%2B+sql%3Arnk_scale+%28+%3CLONG%3A%3AIRI_RANK%3E+%28+%3Fs1+%29+%29+%29+limit+100+%7D+.+FILTER+%28+lang%28%3Fs2%29+%3D+%22en%22%29+%7D+limit+2&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=2000&debug=on

nbassili commented 8 years ago

Thank you both! Unfortunately, the answer of @jervenBolleman to filter out not bound ?sc actually does not work (still) on the large query (all four keywords). [http://dbpedia.org/sparql?query=select%20?s1,%20?s2%20where%20{%20{%20select%20?s1,%20?s2%20where%20{%20?s1%20?x%20?o1%20.%20?o1%20bif%3Acontains%20%27%20(%20%22Pierre%22%20AND%20%22Marie%22%20AND%20%22Curie%22%20AND%20%22University%22%20AND%20%22UPMC%22%20)%20%27%20option%20(%20score%20?sc%20)%20.%20FILTER%20(%20BOUND%20(%20?sc%20)%20).%20?s1%20rdfs%3Alabel%20?s2%20.%20?s1%20rdf%3Atype%20dbo%3AEducationalInstitution%20.%20}%20order%20by%20desc%20(%20?sc%20*%203e-1%20%2B%20sql%3Arnk_scale%20(%20%3CLONG%3A%3AIRI_RANK%3E%20(%20?s1%20)%20)%20)%20limit%20100%20}%20.%20FILTER%20(%20lang(?s2)%20%3D%20%22en%22)%20}%20limit%202&entailment=rdfs&timeout=2000]

Even in the case of the single "UPMC" keyword, if I remove the FILTER ( BOUND ( ?sc ) ) from the query, the query still works returning answers. So, the binding of the variable ?sc is not the problem.

Any other ideas? Thnx Nick

nbassili commented 8 years ago

Are there any new ideas / workarounds on the multiple keywords problem? Thnx Nick

nbassili commented 8 years ago

I found one workaround! To use BIND to create a variable that holds the total score and then sort using this variable (instead of sorting using the full mathematical expression). I also moved the ? rdf:type ? pattern at the beginning of the query (to possibly help the optimizer?). I have kept the BOUND filter for safety. However, the query runs fine even without it. [http://dbpedia.org/sparql?query=select%20?s1,%20?s2%20where%20%7B%0A?s1%20rdf%3Atype%20dbo%3AEducationalInstitution%20.%20%0A?s1%20?x%20?o1%20.%0A?o1%20bif%3Acontains%20%27%20(%20%22Pierre%22%20AND%20%22Marie%22%20AND%20%22Curie%22%20AND%20%22University%22%20AND%20%22UPMC%22%20)%20%27%20option%20(%20score%20?sc%20)%20.%0AFILTER%20(%20bound(?sc)%20)%0ABIND%20(%20?sc%20*%203e-1%20%2B%20sql%3Arnk_scale%20(%20%3CLONG%3A%3AIRI_RANK%3E%20(%20?s1%20)%20)%20AS%20?tsc)%0A?s1%20rdfs%3Alabel%20?s2%20.%0AFILTER%20(lang(?s2)%20%3D%20%22en%22)%0A%7D%0Aorder%20by%20desc%20(%20?tsc%20)%0Alimit%202&entailment=rdfs&timeout=2000]

Thank you for your help!

Best, Nick

minusdavid commented 7 years ago

Cheers @nbassili. I was driving myself crazy trying to send a query to Virtuoso! While I noticed Virtuoso was using "query" instead of "update", I figured it would adhere to the rest of the SPARQL spec (https://www.w3.org/TR/sparql11-protocol/#update-operation), but it appears not...