ruby-rdf / sparql-client

SPARQL client for Ruby.
http://rubygems.org/gems/sparql-client
The Unlicense
112 stars 58 forks source link

Using prefixed URIs with Query#where #4

Closed cldwalker closed 13 years ago

cldwalker commented 14 years ago

After adding PREFIX support, I noticed that I can't actually use it because of the way Query#where works:

>> SPARQL::Client::Query.select.prefix("dc: <http://purl.org/dc/elements/1.1/>").where([:s, :p, RDF::URI.new('dc:abstract')]).to_s
 => "PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { ?s ?p <dc:abstract> .  }"

The above query isn't valid because I need 'dc:abstract' instead of 'dc:abstract'.

To fix this I looked inside RDF::Query::Pattern#to_s and noticed '<%s>' is hardcoded. From the fixme I'm guessing this behavior will eventually be moved into a Statement instance method.

But even when this is done, I'll still need a RDF::Value subclass which just returns the original string. I looked for one and didn't find one.

The solution I propose is just to add this as the first case statement in SPARQL::Client::Query#serialize_value:

when !value.is_a?(RDF::Value) then value.to_s

This would allow the above example to do:

 >> SPARQL::Client::Query.select.prefix("dc: <http://purl.org/dc/elements/1.1/>").where([:s, :p, 'dc:abstract']).to_s
 => "PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { ?s ?p dc:abstract .  }"

An added benefit to this change would be that any SPARQL syntax that isn't supported yet can just be passed as a string i.e. blank nodes:

>> SPARQL::Client::Query.select.where([ :s, :p, '[]' ])
=> "SELECT * WHERE { ?s ?p [] . }

Thoughts? Gabriel

artob commented 14 years ago

The problem with passing through strings as-is into the generated query string is that everywhere else in RDF.rb-based libraries where you can use triples (represented as 3-element Ruby arrays), anything that isn't an RDF::Value is always assumed to be an implicit RDF::Literal value.

This is very convenient, because it enables [:s, :p, "Hello!"] to work as effective short-hand for [:s, :p, RDF::Literal.new("Hello!")]. I'd like to be consistent with other RDF.rb libraries and adhere to the principle of least surprise in SPARQL::Client as well. We don't yet support this convention in the current version, but we really ought to eventually.

Now, I guess I haven't run into the problem you describe because I actually never bother to define prefixes for the generated queries; after all, I'm not really generating them for human consumption but for machines. So, the way that I would construct your example query would be as follows:

>> SPARQL::Client::Query.select.where([:s, :p, RDF::DC11.abstract]).to_s
=> "SELECT * WHERE { ?s ?p <http://purl.org/dc/elements/1.1/abstract> . }"

That is, I'd use the prefixing/abbreviation mechanisms of RDF.rb itself, instead of delegating that to the SPARQL server. The Ruby code looks a lot cleaner when one doesn't have to worry about prefixes.

That said, certainly it could also be a valid use case to generate queries that use prefixes and CURIEs, for instance if human comprehensibility of the generated query string is particularly important for some reason. So we do need a solution here; I'm just not sure what exactly it should be. But I'd like to avoid going the strings route unless we really can't figure out something else...

artob commented 14 years ago

Thinking about this some more, we currently have an RDF::URI#qname method that returns a two-tuple representing the prefix and the local name of a URI:

>> RDF::DC11.abstract
=> #<RDF::URI:0x8119e0e8(http://purl.org/dc/elements/1.1/abstract)>

>> RDF::DC11.abstract.qname
=> [:dc11, :abstract]

Perhaps SPARQL::Client could treat any two-tuples given to it the same way, i.e., as prefixed names? More concretely, would [:s, :p, [:dc, :abstract]] work as the syntax for what you want, Gabriel?

bhuga commented 14 years ago

I have to vote against allowing [:s, :p, 'dc:abstract'] or [:s, :p, [:dc, :abstract]]. I particularly like the part where non-URIs get interpreted as literals, so I would especially vote against 'db:abstract'. While prefixes might need support to generate human-readable queries, I think that supporting either of the above is allowing a serialization concern to enter the model layer, a blurring of concerns we have thankfully avoided in RDF.rb proper. So I'd want it to work as:

select.prefix(RDF::DC, 'dc')
select.where([:s, :p, RDF::DC.abstract])
#=> "PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT * WHERE { ?s ?p dc:abstract .  }"
cldwalker commented 14 years ago

Ah. I wasn't aware that string was a shorthand for Literal elsewhere. We can forget that solution. To better understand my problem I should explain that I'm actually using this gem from the commandline to make sparql queries. I execute the above example on the commandline as boson sparql -p -t=p dc:abstract. Being on the commandline, I need a way to generate prefixed URIs given commandline strings. Unfortunately, neither of the solutions you've proposed would allow me to do that. They require a constant or an array as arguments.

How about my first proposal where we create an RDF::Value subclass to handle this case? RDF::Query::Pattern#to_s would have to be altered to allow this new class to return its unaltered string value with #to_s.

bhuga commented 14 years ago

Although you could come up with a custom subclass of RDF::Value for your project, that hierarchy is reserved for the data model and this is still a serialization question. I'll still chime in that this is not appropriate for sparql-client proper.

And besides, the fact that they are constants doesn't prevent you from using them on the command line. You'd just have to edit your create_rdf_value function for the example given. You'd get 'DC.abstract' as a string from Ruby in the arguments, and this will turn it into the correct constant:

(vocab, predicate) = 'DC.abstract'.split(/\./)
RDF.const_get(vocab).send(predicate.to_sym)
=> #<RDF::URI:0xc7e7f0(http://purl.org/dc/terms/abstract)>

This is a little messy, but I think you'll find if you start using RDF::Vocabulary instead of a custom string-based namespace scheme, you'll come out with a net gain in clarity.

As an aside, Boson seems cool.

cldwalker commented 14 years ago

Creating a RDF::Value subclass for my problem is actually easier than I thought: ::RDF::Query.module_eval %[class StringVariable < Variable; def to_s; @name; end; end]

The only problem then is that the stringification of a Variable is hardcoded in SPARQL::Client::Query#serialize_value. If we can change that line to: when value.variable? then value.to_s

then I can use the above solution in my own library. I'm assuming querying won't change a variable's bindings and thus its #to_s value.

How about that solution? I can fork and pull that one line if you want.

artob commented 14 years ago

Sounds good, let's do that.

cldwalker commented 14 years ago

Cool. Pulled and forked.

@bhuga: Glad you like. Feel free to ask any questions about it. I hope to get around to writing a blog post about using it with rdf and sparql one of these days.

artob commented 14 years ago

Thanks Gabriel, I'll merge & release tomorrow.

cldwalker commented 13 years ago

Seems this was resolved awhile ago.