hsolbrig / PyShEx

ShEx interpreter for ShEx 2.0
Creative Commons Zero v1.0 Universal
25 stars 10 forks source link

Python implementation of ShEx 2.0

Pyversions

PyPi

DOI

CodeCov

https://mybinder.org/v2/gh/hsolbrig/pyshex/master

This package is a reasonably literal implementation of the Shape Expressions Language 2.0. It can parse and "execute" ShExC and ShExJ source.

Revisions

Installation

pip install PyShEx

Note: If you need to escape single quotes in RDF literals, you will need to install the bleeding edge of rdflib:

pip uninstall rdflib
pip install git+https://github.com/rdflib/rdflib

Unfortunately, however, rdflib-jsonld is NOT compatible with the bleeding edge rdflib, so you can't use a json-ld parser in this situation.

shexeval CLI

> shexeval -h
usage: shexeval [-h] [-f FORMAT] [-s START] [-ut] [-sp STARTPREDICATE]
                [-fn FOCUS] [-A] [-d] [-ss] [-cf] [-sq SPARQL] [-se]
                [--stopafter STOPAFTER] [-ps] [-pr] [-gn GRAPHNAME] [-pb]
                rdf shex

positional arguments:
  rdf                   Input RDF file or SPARQL endpoint if slurper or sparql
                        options
  shex                  ShEx specification

optional arguments:
  -h, --help            show this help message and exit
  -f FORMAT, --format FORMAT
                        Input RDF Format
  -s START, --start START
                        Start shape. If absent use ShEx start node.
  -ut, --usetype        Start shape is rdf:type of focus
  -sp STARTPREDICATE, --startpredicate STARTPREDICATE
                        Start shape is object of this predicate
  -fn FOCUS, --focus FOCUS
                        RDF focus node
  -A, --allsubjects     Evaluate all non-bnode subjects in the graph
  -d, --debug           Add debug output
  -ss, --slurper        Use SPARQL slurper graph
  -cf, --flattener      Use RDF Collections flattener graph
  -sq SPARQL, --sparql SPARQL
                        SPARQL query to generate focus nodes
  -se, --stoponerror    Stop on an error
  --stopafter STOPAFTER
                        Stop after N nodes
  -ps, --printsparql    Print SPARQL queries as they are executed
  -pr, --printsparqlresults
                        Print SPARQL query and results
  -gn GRAPHNAME, --graphname GRAPHNAME
                        Specific SPARQL graph to query - use '' for any named
                        graph
  -pb, --persistbnodes  Treat BNodes as persistent in SPARQL endpoint

Documentation

See: examples Jupyter notebooks for sample uses

General Layout

The root pyshex package is subdivided into:

The ShEx schema definitions for this package come from ShExJSG

We are trying to keep the python as close as possible to the (semi-)formal specification. As an example, the statement:

Se is a ShapeAnd and for every shape expression se2 in shapeExprs, satisfies(n, se2, G, m)

is implemented in Python as:

        ...
if isinstance(se, ShExJ.ShapeAnd):
    return satisfiesShapeAnd(cntxt, n, se)
        ...
def satisfiesShapeAnd(cntxt: Context, n: nodeSelector, se: ShExJ.ShapeAnd) -> bool:
    return all(satisfies(cntxt, n, se2) for se2 in se.shapeExprs)

Dependencies

This package is built using:

Conformance

This implementation passes all of the tests in the master branch of validation/manifest.ttl with the following exceptions:

At the moment, there are 1088 tests, of which:

As mentioned above, at the moment this is as literal an implementation of the specification as was sensible. This means, in particular, that we are less than clever when it comes to partition management.

Docker

Build

docker build -t pyshex docker

Run

docker run --rm -it pyshex -gn '' -ss -ut -pr -sq 'select distinct ?item where{?item a <http://w3id.org/biolink/vocab/Gene>} LIMIT 1' http://graphdb.dumontierlab.com/repositories/ncats-red-kg https://github.com/biolink/biolink-model/raw/master/shex/biolink-modelnc.shex