joelkuiper / yesparql

a Yesql inspired SPARQL library
https://github.com/joelkuiper/yesparql
Eclipse Public License 1.0
43 stars 8 forks source link

YeSPARQL

Build Status

YeSPARQL is a library for executing SPARQL queries against endpoints or TDB stores, heavily influenced by Yesql.

Annotated Source

Installation

Clojars Project

Add this to your Leiningen :dependencies:

[yesparql "0.3.1"]

What's the point?

Yesql does a much better job explaining this. But in short, it's annoying to write SPARQL in Clojure. While you could design some DSL, these are often lacking in expressiveness and have nasty corner cases; so why not just use SPARQL. By defining the queries as simple SPARQL in separate files, you get a clean separation of concerns without polluting your code with long queries. Other perks include:

Eeh, I meant what's the point of SPARQL?

See my introductory blog post on Semantic Web and SPARQL: Whatever happened to Semantic Web? or see SPARQL by Example.

Usage

One File, One Query

Create an SPARQL query and save it as a file.

-- Example dbpedia query, returning intellectuals restricted by subject
-- Endpoint: http://dbpedia.org/sparql

PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX category: <http://dbpedia.org/resource/Category:>

SELECT DISTINCT ?person {
    { ?person a dbpedia-owl:Scientist }
    UNION
    { ?person a dbpedia-owl:Writer }
    UNION
    { ?person a dbpedia-owl:Philosopher }
    ?person dct:subject ?subject .
} LIMIT 10

Make sure it's on the classpath. For this example, it's in src/some/where/. The syntax of the query is automatically guessed at using the filename, and currently defaults to SPARQL 1.1.

(require '[yesparql.core :refer [defquery]])

;; Import the SPARQL query as a function.
;; In this case we use DBPedia as a remote endpoint
(defquery select-intellectuals "some/where/query.sparql"
  {:connection "http://dbpedia.org/sparql"})

;; The function is now available in the namespace
;; Docstrings are automatically generated
(clojure.repl/doc select-intellectuals)

;=> ------------------------
;=> yesparql.core-test/select-intellectuals
;=> ([] [{:keys [connection bindings]}])
;=> Example dbpedia query, returning intellectuals restricted by subject
;=> Endpoint: http://dbpedia.org/sparql

;; Running the query is as easy as calling the function in a with-open
;; The result is a lazy sequence of Clojure data structures
(with-open [result (select-intellectuals)]
  (do-something-with-result! result))

You can supply bindings as a map of strings (the names) or keywords to URI, URL, RDFNode, Node, or Literal (default). Alternatively, you can supply a map of {:type (optional, uri), :lang (optional, str or keyword), :value} which will be coerced to the appropriate Literal automatically. Prefixes that were defined in the query get automatically resolved when passing in an URI for SPARQL QUERY, only common prefixes are resolved for SPARQL UPDATE.

These bindings get inserted into the query using a Parameterized SPARQL String.

In addition you can overwrite limit, offset, fields at query time.

A complete example of running a SPARQL SELECT against DBPedia, with initial bindings and limit:

user> (require '[yesparql.sparql :refer :all])

user> (with-open
        [result
         (select-intellectuals
          {:limit 10
           :bindings
           {:subject
            (java.net.URI. "category:1952_deaths")}})]
        (into [] result))

;=> [{"person" "http://dbpedia.org/resource/Antonio_Damasio"}
;=>  {"person" "http://dbpedia.org/resource/Albert_Victor_B%C3%A4cklund"}
;=>  {"person" "http://dbpedia.org/resource/Alexander_Oparin"}
;=>  {"person" "http://dbpedia.org/resource/Alexander_Stepanovich_Popov"}
;=>  {"person" "http://dbpedia.org/resource/Andrew_Ainslie_Common"}
;=>  {"person" "http://dbpedia.org/resource/Annie_Montague_Alexander"}
;=>  {"person" "http://dbpedia.org/resource/Anthony_James_Leggett"}
;=>  {"person" "http://dbpedia.org/resource/Ascanio_Sobrero"}
;=>  {"person" "http://dbpedia.org/resource/Axel_Thue"}
;=>  {"person" "http://dbpedia.org/resource/B%C3%A9la_Bollob%C3%A1s"}]

You can also transform the result directly into other formats like CSV:

user> (with-open [result (select-intellectuals)]
        (result->csv (->result result)))

;=> person
;=> http://dbpedia.org/resource/Antonio_Damasio
;=> http://dbpedia.org/resource/Albert_Victor_B%C3%A4cklund
;=> http://dbpedia.org/resource/Alexander_Oparin
;=> http://dbpedia.org/resource/Alexander_Stepanovich_Popov
;=> http://dbpedia.org/resource/Andrew_Ainslie_Common
;=> http://dbpedia.org/resource/Annie_Montague_Alexander
;=> http://dbpedia.org/resource/Anthony_James_Leggett
;=> http://dbpedia.org/resource/Ascanio_Sobrero
;=> http://dbpedia.org/resource/Axel_Thue
;=> http://dbpedia.org/resource/B%C3%A9la_Bollob%C3%A1s

WARNING: Queries should be called in a with-open in order to close the underlying QueryExecution, or be closed manually. The underlying ResultSet (and result iterator) will become invalid after closing (see copy-result-set). While it is completely possible to not close the result, it will leak resources and is not advisable.

One file, Many Queries

Same as Yesql

SPARQL Injection Notes

While ParameterizedSparqlString was in part designed to prevent SPARQL injection it is by no means foolproof because it works purely at the textual level. The current version of the code addresses some possible attack vectors that the developers have identified but we do not claim to be sufficiently devious to have thought of and prevented every possible attack vector.

Therefore we strongly recommend that users concerned about SPARQL Injection attacks perform their own validation on provided parameters and test their use of this class themselves prior to its use in any security conscious deployment. We also recommend that users do not use easily guess-able variable names for their parameters as these can allow a chained injection attack, though generally speaking the code should prevent these.

TDB

In addition to supplying a SPARQL Endpoint URL, you can also supply a TDB Dataset. The yesparql.tdb namespace provides convenience methods for constructing these.

(require '[yesparql.tdb :as tdb])

(def tdb (tdb/create-in-memory))

(defquery select-all
  "yesparql/samples/select.sparql"
  {:connection tdb})

Query types

Since SPARQL has multiple query types we consider the following syntax for the query names:

Result format

Each of the executed queries returns an iterator-seq of result binding maps (SELECT), or triples (DESCRIBE, CONSTRUCT) in a native Clojure data structure. ASK simply returns a boolean.

Access to the underlying Jena ResultSet and QueryExecution are provided by ->result, ->query-execution functions for SELECT queries.

For DESCRIBE and CONSTRUCT access to the Jena iterator of Triples is provided by ->triples, in addition to ->query-execution. A convenience method ->model can be used to transform the triples in to a Jena Model. For example: (model->json-ld (->model (construct-query)).

Note that it's perfectly to fine to use these Jena objects, with the Clojure-Java interop.

Result serialization

YeSPARQL offers various functions to serialize Model and ResultSet in the yesparql.sparql namespace.

user> (require '[yesparql.sparql :refer :all])

user> (def result
        (with-open [result (select-intellectuals)]
          (copy-result-set (->result result))))

;; Converting results...
user> (result->json result)
user> (result->csv result)
user> (result->xml result) ; NOT RDF, but the SPARQL XML result format

You can use result->model to convert a ResultSet (SELECT) to a Model; or use ->model on the result of CONSTRUCT and DESCRIBE queries.

;; Convert to model
user> (def model (result->model result))

;; Then choose one of the serializations...
user> (model->json-ld model)
user> (model->rdf+xml model)
user> (model->ttl model)

;; Or use one of the other serialization formats
user> (serialize-model model "format")

See Jena Model Write formats for formats that can be passed to serialize-model.

If a ResultSet has to be traversed multiple times use the copy-result-set, which generates a rewindable copy of the entire ResultSet (as in the example above).

See also ResultSetFactory/makeRewindable.

A note on lazyness

The results are returned in a lazy fashion, but the underlying ResultSet will become invalid after closing the result. So this won't work:

(with-open [r (select-intellectuals)] (map println r))

ARQException ResultSet no longer valid (QueryExecution has been
closed) org.apache.jena.sparql.engine.ResultSetCheckCondition.check
(ResultSetCheckCondition.java:106)

Instead do:

(with-open [results (select-intellectuals)]
   (doseq [r results]
     (println r)))

;=>{person http://dbpedia.org/resource/Antonio_Damasio}
;=>{person http://dbpedia.org/resource/Albert_Victor_B%C3%A4cklund}
;=>{person http://dbpedia.org/resource/Alexander_Oparin}
;=>{person http://dbpedia.org/resource/Alexander_Stepanovich_Popov}
;=>{person http://dbpedia.org/resource/Andrew_Ainslie_Common}
;=>{person http://dbpedia.org/resource/Annie_Montague_Alexander}
;=>{person http://dbpedia.org/resource/Anthony_James_Leggett}
;=>{person http://dbpedia.org/resource/Ascanio_Sobrero}
;=>{person http://dbpedia.org/resource/Axel_Thue}
;=>{person http://dbpedia.org/resource/B%C3%A9la_Bollob%C3%A1s}

In general make sure you do any and all work you want to do on the results eagerly in the with-open.

A note on transactions

When executing against TDB it is recommended to use transactions to prevent against data corruption. You can set the flags yourself on the Dataset or use the with-transaction macro, although support for this is somewhat lacking.

TODO

New to Clojure?

If you are new to Clojure the code might look unfamiliar. But, Clojure is a wonderful language, and if you are interested in learning we recommend the following resources:

Acknowledgments

Funding

This software was commissioned and sponsored by Doctor Evidence. The Doctor Evidence mission is to improve clinical outcomes by finding and delivering medical evidence to healthcare professionals, medical associations, policy makers and manufacturers through revolutionary solutions that enable anyone to make informed decisions and policies using medical data that is more accessible, relevant and readable.

License

Copyright © 2015-2018 Joël Kuiper

Distributed under the Eclipse Public License, the same as Clojure.

Does it SPARQL? :sunny: