Open jakubklimek opened 3 years ago
First thing to be certain of is that your queries are not running into Anytime Query cutoffs (i.e., that they're not being stopped after partial execution), and therefore returning partial results. This is most easily checked by running your queries through one of these command-line tools:
isql
, where the Anytime Query time limits are not applied, and you can run your queries as SPARQL-in-SQL (a/k/a SPASQL) by simply prepending the SPARQL
keyword to each (before the first PREFIX
)
curl
, where you can look for the four additional response headers that get added when the Anytime limits have effect:
X-SQL-State:
X-SQL-Message:
X-Exec-Milliseconds:
X-Exec-DB-Activity:
Next, the locations of the BIND
, OPTIONAL
, and other clauses may impact the results in sometimes surprising ways.
Most commonly, this results from the "inside-out" (often confusingly called "bottom-up") execution order of SPARQL subqueries. That is, SPARQL subqueries are executed from the deepest up to the shallowest.
It's difficult to quickly see whether this is what's happening for you, given then length of these queries. It would help if you annotated the query variants with comment lines bracketing the sections you've shifted vertically, and marking where they were moved from/to. It can also be helpful to use different horizontal indentation patterns -- e.g., putting opening brackets/braces/parentheses on newlines, and increasing the indent with each such segment; decreasing the indent with the closing brackets/braces/parentheses, which is vertically aligned with its respective opener -- e.g., space-padding the widths of subject/predicate/object terms, so that these columns are visibly obvious.
Presuming that neither of the above apply in this case, analysis of what's happening typically requires submission of the query profiles and execution plans, as well as their SQL translations.
/sparql
Endpointexplain
and profile
plans for a simple SPARQL query?These will be easiest to work with if you attach them as files, rather than pasting them into comments on this issue.
As your recent issues aren't clearly bugs, it may make sense to shift some or all to the OpenLink Community Forum where there are more active participants than in this issues arena...
I have the following query, used to harvest DCAT-AP metadata from local data catalogs implemented as SPARQL endpoints, often powered by Virtuoso Open-Source. For some reason, it behaves very strangely. When run directly against Virtuoso's SPARQL endpoint, the browser says ERR_CONNECTION_CLOSED. It is meant to be run (the data for it is in) https://data.mvcr.gov.cz/sparql. However, it behaves this way even on other instances, such as:
On the other hand, it works on (returns empty result): https://data.mpsv.cz/sparql (07.20.3215)
When run via Yasgui, or LinkedPipes ETL SPARQL querying component using rdf4j on https://data.mvcr.gov.cz/sparql, I get some results, but they seem wrong.
Below are 4 variants of the query, which IMHO should all work and result in the same results. I apologize for the length of the query, but changing it affects the issue (numbers of results, etc.):
returns 23 results. When I remove the
OPTIONAL
surrounding thedcat:Distribution
, which should only decrease the number of results, or keep it the same, I get 73 results:In addition, when I rearrange the OPTIONAL clauses (which should not change the result), I get 67 results:
now, if I move the
BIND
clauses from thedcterms:temporal
OPTIONAL
clause, I get 73 results in all cases, but the Virtuoso's endpoint still drops the HTTP connection when issues directly via browser: