ruby-rdf / sparql-client

SPARQL client for Ruby.
http://rubygems.org/gems/sparql-client
The Unlicense
112 stars 58 forks source link

Blank Nodes in Graph Patterns #59

Open no-reply opened 9 years ago

no-reply commented 9 years ago

SPARQL::Client::Repository#query_pattern runs afoul of a restriction in SPARQL about the allowed blank node labels in queries. I suspect most SPARQL implementations will just interpret these as two unique blank nodes without problem, but I noticed that Blazegraph throws errors and it seems technically correct to do so.

We have the option to change the node labels within the method, here; better might be to find a place somewhere in Pattern to change the blank node labels so they won't be repeated.

Thoughts?

gkellogg commented 9 years ago

It's not clear exactly what restriction you're referring to. The link you provided shows the creation of a SPARQL CONSTRUCT using the supplied patterns. Is it that the serialization of a BNode element might render something which is not valid SPARQL? Do you have a specific example?

Any BNodes generated automatically should be fine; those that are created by a client may fail on some servers, in which case the server may complain with a failure code, but that would seem to be just fine to me, as the client is in charge of creating such nodes.

no-reply commented 9 years ago

Sorry, I gave the wrong link. The restriction I intended to reference is this one:

When using blank nodes of the form _:abc, labels for blank nodes are scoped to the basic graph pattern. A label can be used in only a single basic graph pattern in any query.

When given patterns with blank nodes, the code linked in the original issue description creates queries like:

CONSTRUCT { _:one _:two _:three . } WHERE { _:one _:two _:three . }

Blazegraph (apparently correctly) rejects this. From a description of the behavior I sent them:

In short, some upstream code uses the same bnode label in both CONSTRUCT and WHERE. The error you throw (included below) appears correct, and we'll fix this on the RDF.rb side, but I thought you might be interested in the issue. It seems like it would be harmless to interpret these as two unique bnodes in two separate scopes.

ERROR: BigdataRDFServlet.java:214: cause=java.util.concurrent.ExecutionException: org.openrdf.query.MalformedQueryException: com.bigdata.rdf.sail.sparql.ast.VisitorException: BNodeID already used in another scope: g69995647769040, query=SPARQL-QUERY: queryStr=CONSTRUCT { _:g69995647769040 http://xmlns.com/foaf/0.1/mbox_sha1sum ?g69995650401960 . } WHERE { _:g69995647769040 http://xmlns.com/foaf/0.1/mbox_sha1sum ?g69995650401960 . }

gkellogg commented 9 years ago

I don't think there are any official tests for this, and my implementation certainly doesn't raise an error. (Of course, BNodes in predicate locations are never okay).

In the case of #query_pattern, we could simply fail if any element of the pattern is a BNode; arguably, the rdf-spec tests for Repository shouldn't use these patterns, as you generally can't remotely work with BNodes without skolemizing them. It only really works for in-memory Repositories, or those making a guarantee about BNode label stability (we're considering this for a hypothetical normalized dataset).

no-reply commented 9 years ago

In the case of #query_pattern, we could simply fail if any element of the pattern is a BNode

I'm thinking this would be overkill. Something like CONSTRUCT { _:node ?predicate ?object . } WHERE { ?node ?predicate ?object . FILTER(isBlank(?node)) } seems like a legitimate pattern.

The rest of what you've said rings true. The bnode handling I have in the Blazegraph work thus far is okay-ish, but carries some big caveats. I think there are solutions here without leaving the realm of SPARQL Update.

gkellogg commented 9 years ago

Yes, I was thinking much the same (?node instead of _:node, of course).

no-reply commented 9 years ago

I think they are semantically equivalent, with each constructing "fresh" blank nodes in CONSTRUCT for each solution.

In any case, there are two tests that fail with this error when run over Blazegraph, one can just be changed--I don't think it's intended to test anything to do with blank nodes--the other is:

... behaves like an RDF::Repository when querying statements behaves like an RDF::Queryable RDF::Queryable#first_value returns the correct value when the pattern matches
     Failure/Error: expect(subject.first_value(matching_pattern)).to eq subject.first_literal(matching_pattern).value
     SPARQL::Client::MalformedQuery:
       SPARQL-QUERY: queryStr=CONSTRUCT { _:t519724 <http://xmlns.com/foaf/0.1/mbox_sha1sum> ?g70276900592100 . } WHERE { _:t519724 <http://xmlns.com/foaf/0.1/mbox_sha1sum> ?g70276900592100 . }
       java.util.concurrent.ExecutionException: org.openrdf.query.MalformedQueryException: com.bigdata.rdf.sail.sparql.ast.VisitorException: BNodeID already used in another scope: t519724

I have a patch that uses a different node ID in WHERE, but would be happy to try/submit one that switches to a filtered variable shared between the patterns.

gkellogg commented 6 years ago

@no-reply I'm assigning this to you to apply your patch.