mff-uk / odcs-dpus

Repository for DPUs (Data processing units) for ETL tool for RDF data
3 stars 2 forks source link

Ordered SPARQL queries should use Virtuoso's scrollable cursors #78

Closed jindrichmynarz closed 7 years ago

jindrichmynarz commented 10 years ago

All DPUs loading data via SPARQL queries using ORDER BY (such as XSLT DPU) should use Virtuoso's scrollable cursors (see Virtuoso's documentation, section "Example: Prevent Limits of Sorted LIMIT/OFFSET query"). When OFFSET in an ordered SPARQL query exceeds Virtuoso's setting MaxSortedTopRows from virtuoso.ini (typically set to 10-20K rows), the query fails with error message like the following:

Virtuoso 22023 Error SR353: Sorted TOP clause specifies more then 41000 rows to sort.
Only 40000 are allowed.
Either decrease the offset and/or row count or use a scrollable cursor

Temporary workaround for this issue is to increase the MaxSortedTopRows setting, but the solution is to use a sub-SELECT with ORDER BY wrapped in SELECT query with OFFSET and LIMIT. For example, the XSLT DPU uses the query:

SELECT ?s ?o
WHERE {
  ?s <http://linked.opendata.cz/ontology/odcs/xmlValue> ?o .
}
ORDER BY ?s ?o

This query with scrollable cursor that allows loading larger data could look like the following:

SELECT ?s ?o
WHERE {
  {
    SELECT ?s ?o
    WHERE {
      ?s <http://linked.opendata.cz/ontology/odcs/xmlValue> ?o .
    }
    ORDER BY ?s ?o
  }
}
# Pagination goes here:
LIMIT 10000
OFFSET 1000000
tomas-knap commented 9 years ago

Related: https://github.com/UnifiedViews/Plugins/issues/241

jakubklimek commented 7 years ago

Scrollable cursors are implemented in LinkedPipes ETL components http://etl.linkedpipes.com/components/e-sparqlendpointselectscrollablecursor and http://etl.linkedpipes.com/components/e-sparqlendpointconstructscrollablecursor