Evaluation of multiple SERVICE clauses

frmichel commented 6 years ago

Dear all,

I'm noticed that Virtuoso adopts what seems to me a very inefficient strategy to evaluate a SPARQL query containing multiple SERVICE clauses with independent graph patterns (no common variable): each SERVICE clause is invoked once for each solution retrieved from previously evaluated SERVICE clauses. This ends up in a multiplication of SERVICE invocations, although they return the same results each time (since the graph patterns are independent).

Here is an example calling two services I've deployed. The graph patterns in each SERVICE clause are independent: the first one has a variable ?img and the second a variable ?audioUrl. And there is no other dependency.

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix schema: <http://schema.org/>

construct {
    [] foaf:depiction ?img ; schema:contentUrl ?audioUrl.
} where {
    service <https://erebe-vm2.i3s.unice.fr/sparql-ms/flickr/getPhotosByGroupByTag?group_id=806927@N20&tags=taxonomy:binomial=Delphinus+delphis> 
    { select * where { [] foaf:depiction ?img.  } limit 3 }

    service <https://erebe-vm2.i3s.unice.fr/sparql-ms/macaulaylibrary/getAudioByTaxon?name=Delphinus+delphis> 
    { select ?audioUrl where { [] schema:contentUrl ?audioUrl. } limit 5 }
}

As you can see, the first query returns at most 3 results, the 2nd query 5 results. When I monitor the logs of my services, I notice that the 2nd is called exactly one time, whereas the first one is called 5 times, i.e. once for each result retrieved from the 2nd SERVICE clause.

Would you say this is a bug or is there a good reason that I don't catch?

Thanks for your help.

Note: I'm using the OS edition 7.20.

HughWilliams commented 6 years ago

@frmichel: Please provide the exact version of Virtuoso being used with the command:

virtuoso-t -?

Are you saying that if the limit in the second service call query was increased to 10 , then the first service query would get called 10 times ie the call rate is based on Limit clause ?

What RDF Store is being used for hosting the https://erebe-vm2.i3s.unice.fr/sparql-ms SPARQL endpoint points ?

frmichel commented 6 years ago

Hi @HughWilliams,

The exact version is 7.2.5-dev.3217-pthreads, Mar 16 2017.

Are you saying that if the limit in the second service call query was increased to 10 , then the first service query would get called 10 times

Yes.

ie the call rate is based on Limit clause ?

Not on the limit clause but on the actual number of solutions returned by the SERVICE clause. I've set a LIMIT to avoid to many calls, but that is what happens. Without limit, the first SERVICE returns 10 solutions, and the second 28 solutions. It seems that the second SERVICE clause is called first; then, the first SERVICE clause is called 28 times.

Franck.

HughWilliams commented 6 years ago

@frmichel: What about the details of the RDF Store is being used for hosting the https://erebe-vm2.i3s.unice.fr/sparql-ms SPARQL endpoint points ?

Note the following documents on improving Virtuoso SPARQL-FED query performance:

http://vos.openlinksw.com/owiki/wiki/VOS/VirtTipsAndTricksDiscoverSPARQFedCapabilities http://vos.openlinksw.com/owiki/wiki/VOS/VirtTipsAndTricksDiscoverSPARQFedCapabilitiesSPARQL

although this is generally only achieved when the remote SPARQL endpoints are Virtuoso instances, which does not seem to be case here ...

frmichel commented 6 years ago

Hi Hugh,

The endpoints behind the https://erebe-vm2.i3s.unice.fr/sparql-ms are regular SPARQL 1.1 endpoints implemented with an in-house engine : Corese-KGRAM (http://wimmics.inria.fr/corese, https://github.com/Wimmics/corese).

Unfortunately, the documents you point to seem indeed specific to Virtuoso but do not apply to other implementations. Do they?

Franck.

openlink / virtuoso-opensource

Evaluation of multiple SERVICE clauses #724