openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
863 stars 210 forks source link

filter / values clause with transitive path queries: Virtuoso 37000 Error TR...: transitive start not given #234

Open joernhees opened 10 years ago

joernhees commented 10 years ago

This works:

query:

select distinct ?s ?syn where {
     ?s (owl:sameAs|^owl:sameAs)+ ?syn.
     FILTER(?s=dbpedia:Germany)
}

This doesn't:

The following 3 queries fail with Virtuoso 37000 Error TR...: transitive start not given:

query:

select distinct ?s ?syn where {
     ?s (owl:sameAs|^owl:sameAs)+ ?syn.
     FILTER(?s=dbpedia:Germany || ?s=dbpedia:UnitedStates)
}

query:

select distinct ?s ?syn where {
     ?s (owl:sameAs|^owl:sameAs)+ ?syn.
     VALUES(?s) {(dbpedia:Germany)}
}

query:

select distinct ?s ?syn where {
     ?s (owl:sameAs|^owl:sameAs)+ ?syn.
     VALUES(?s) {(dbpedia:Germany) (dbpedia:United_States)}
}
HughWilliams commented 10 years ago

The is due to the current implementation of transitivity in SQL which requires an equality of one end of transitive chain to some value calculated on previous steps (or a constant). In the listed cases SPARQL translate FILTER or VALUES into IN operator so both become equivalent to FILTER (?s in (dbpedia:Germany, dbpedia:United_States)) . To fix we would need to add support for the setting of a transitive start by IN operator.

A work around would be a query of the following form:

select distinct ?s ?syn where { { select ?s where { VALUES(?s) {(dbpedia:Germany) (dbpedia:United_States)} } } { select ?s ?syn where { ?s owl:sameAs|^owl:sameAs ?syn } } option (TRANSITIVE, T_IN (?s), T_OUT (?syn), T_DISTINCT) }

joernhees commented 10 years ago

Thanks for the feedback. The workaround is a bit problematic as most other endpoints don't understand it :( So my workaround for now was to do one query per URI (and not 100 in bulk). Would really speed things up if this worked...

kidehen commented 10 years ago

On 9/17/14 12:58 PM, Jörn Hees wrote:

Thanks for the feedback. The workaround is a bit problematic as most other endpoints don't understand it :( So my workaround for now was to do one query per URI (and not 100 in bulk). Would really speed things up if this worked...

— Reply to this email directly or view it on GitHub https://github.com/openlink/virtuoso-opensource/issues/234#issuecomment-55925171.

Most other endpoints don't even offer transitivity, period. Even if they did, there would be scalability nightmares etc... :)

We can't always wait for the SPARQL spec to evolve in line with challenges such as this.

Is there a SPARQL spec for transitivity that I am overlooking here?

[1] http://linkeddata.uriburner.com/c/9W7XMIB -- Transitive Query applied to 60 Billion+ live LOD Cloud Cache SPARQL endpoint.

Regards,

Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

joernhees commented 10 years ago

property paths

kidehen commented 10 years ago

On 9/18/14 9:37 AM, Jörn Hees wrote:

property paths http://www.w3.org/TR/sparql11-query/#propertypaths

— Reply to this email directly or view it on GitHub https://github.com/openlink/virtuoso-opensource/issues/234#issuecomment-56038860.

Property Paths doesn't solve the bigger issue at hand.

The bigger issue at hand is all being able to control this behavior at scale.

What we are going to do is attempt some syntax sugar to make this less unwieldy.

Regards,

Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

joernhees commented 10 years ago

well, i'm looking forward to it, but property paths don't seem bad to me at all...

back to this bug report... @HughWilliams i just noticed that your suggestion also works without the OPTION clause:

query

select distinct ?s ?syn where {
 {
  select ?s where {
   VALUES(?s) {(dbpedia:Germany) (dbpedia:United_States)}
  }
 }
 {
  select ?s ?syn where { ?s (owl:sameAs|^owl:sameAs)+ ?syn }
 }
}