Lack of a standard rules language

dbooth-boston commented 5 years ago

Inference is fundamental to the value proposition of RDF, and almost every application needs to perform some kind of application-specific inference. ("Inference" is used broadly herein to mean any rule or procedure that produces new assertions from existing assertions -- not just conventional inference engines or rules languages.) But paradoxically, we still do not have a standard RDF rules language. (See also Sean Palmer's apt observations about N3 rules.[14]) Furthermore, applications often need to perform custom "inferences" (or data transformations) that are not convenient to express in available (non-standard) rules languages, such as RDF data transformations that are needed when merging data from independently developed sources having different data models and vocabularies. And merging independently developed data is the most fundamental use case of the Semantic Web. . . .

[One] possibility might be to standardize a sufficiently powerful rules language.

However, see also some excellent cautionary comments from Jesus Barras(Neo4J) and MarkLogic on inference: "No one likes rules engines --> horrible to debug / performance . . . Reasoning with ontology languages quickly gets intractable/undecidable" and "Inference is expensive. When considering it, you should: 1) run it over as small a dataset as possible 2) use only the rules you need 3) consider alternatives."[15]

Work activity addressing this issue

Standardize N3 Logic

"We have worked with N3 for years now and there are several reasons why I believe that it should be standardized" https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0063.html

"I have [launched] a community group" https://www.w3.org/community/n3-dev/ https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0252.html

"Seems useful to tighten up N3... but any lessons learned, positive or negative, or things to re-use, from the half-decade W3C put into the RIF work?" https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0254.html

See discussion on the n3-dev mailing list.

Other proposed ideas

IDEA: Embed RDF in a programming language

"One possibility for addressing this need might be to embed RDF in a full-fledged programming language, so that complex inference rules can be expressed using the full power and convenience of that programming language." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0036.html

IDEA: Bind custom inference rules to functions

"Another possibility might be to provide a convenient, standard way to bind custom inference rules to functions defined in a programming language" https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0036.html

IDEA: Update RIF

"re-open a working group on RIF to make it capable of expressing SPARQL constructs, and resolve the datatype and built-in discrepancies with other W3C specs." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0176.html

IDEA: SPARQL-Generate extensions

"SPARQL-Generate (SG) . . . introduces many improvements to SPARQL (while keeping the core syntax as SPARQL) so that you can pretty do all the Extract-Tranform from heterogeneous data to RDF graphs, all in SPARQL." https://ci.mines-stetienne.fr/sparql-generate/ https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0302.html

laurentlefort commented 5 years ago

It is worth adding the Graal/DGLP (Datalog+) to the list of solutions to build on: http://graphik-team.github.io/graal/ and https://github.com/graphik-team/graal

See the 2012 Brief Overview of the Existential Rule Framework by the GraphIK Team from LIRMM-Inria, Montpellier http://graphik-team.github.io/graal/papers/framework_en.pdf

I have used it (and its translator from it to the Datalog+ fragment of Deliberation RuleML 1.01) as a basis for rule conversion from a user-friendly syntax to a machine-friendly format (in my case, Stardog rules).

My interest in GRAAL was partially triggered by parallel research happening on VADALOG: see Swift Logic for Big Data and Knowledge Graphs https://www.ijcai.org/proceedings/2017/0001.pdf

Also, I'm more interested by solutions focusing on keep it simple stupid (and useable) stuff e.g. rule patterns which enable simple graph transformation operators such as the ones described by Vlaidimir Alexiev in Extending OWL2 Property Constructs with OWLIM Rules http://rawgit2.com/VladimirAlexiev/my/master/pubs/extending-owl2/index.html

(am also keen to have a solution like SPARQL-generate which supports other types of (cubic) graph transformations)

Also surprised not to see the work done on SHACL JavaScript Extensions https://w3c.github.io/data-shapes/shacl-js/ not listed above.

maximveksler commented 5 years ago

Cross linked with https://github.com/semantalytics/awesome-semantic-web/issues/76

VladimirAlexiev commented 5 years ago

@laurentlefort thanks for citing "extending-owl2", that's a first :-)

IDEAS

Adding to the list of possible solutions:

Jena rules. Nice mesh of forward and backward chaining
SHACL/SPIN rules

Use cases

The big question is what use cases should be supported? This dictates drastically different tradeoffs.

Eg consider GraphDB (OWLIM) rules. They are very simplistic, but are incrementally computable over a large KB, including when adding and when removing triples.

VladimirAlexiev commented 5 years ago

@laurentlefort what are cubic graph transformations?

draggett commented 5 years ago

I am exploring ideas developed in Cognitive Psychology and backed up by work in Cognitive Neuroscience. Declarative knowledge is expressed as chunks, and procedural knowledge as rules that operate on chunks. See issue #71 "Chunks and Chunk Rules". You can find a longer description with links to a growing set of online demos at:

https://www.w3.org/Data/demos/chunks/chunks.html

This work is aimed at supporting machine learning of vocabularies and rule sets on the premise that manual development and maintenance will prove impractical as the size of vocabularies and and rulesets scale up and up. As such, rules are themselves represented as chunks, so that they can be manipulated as data.

The design criteria for rule languages as a target for machine learning will be different from the design criteria for rule languages intended for direct manual development by human programmers. My intuition is that it is better to have a simple regular language for machine generated rules, and to allow for greater flexibility when it comes to manually crafted graph algorithms that the rules can invoke. This presumes an architecture with serial execution of rules and parallel execution of graph algorithms on distributed graph databases. The latter can be likened to Web search which seeks to find the most relevant matches, taking into account previous queries.

w3c / EasierRDF