ruby-rdf / rdf-n3

Ruby Notation-3 reader/writer for RDF.rb.
http://rubygems.org/gems/rdf-n3
The Unlicense
24 stars 9 forks source link

Implement Formula and rule support #15

Closed gkellogg closed 3 years ago

gkellogg commented 13 years ago

This is a summarization of a conversation initiated by RubVer:

RubVer is interested in adding formulae and rule support to N3 Writer.

My response:

That would be great! My TODO notes were mostly about the reader, but it would make sense for the writer to output formulae as well.

Note that the 0.4.x branch of RDF.rb has full support for SPARQL semantics. When used along with SPARQL::Algebra and SPARQL::Grammar implement complete SPARQL 1.0.

One change I was considering was to change existential variables from BNodes to non-distinguished variables. This is what I needed to do in SPARQL::Grammar. Changing BNode semantics in RDF::Query might also be possible, but it could be in conflict with other equality semantics.

Then ensued a discussion of how formulae might relate to BGP, and the semantics of variables.

log:implies is a sufficient but not a necessary condition. (just imagine all possible other predicates with similar meaning) [1] indicates a formula is defined by its statements, its universals and its existentials. I think it is reasonable to assume that at least anything with a non-empty set of universals or existentials is indeed a formula. In other words, everything that introduces some kind of scope. The impact of this is illustrated by examples such as the one below:

{ _:x a :Cat. } a :Truth.   _:x a :Dog.

As you know, the above example does not introduce an inconsistency because of scoping, so probably this is (close to) a necessary condition for formulae. This example also makes clear that simple BNodes will probably be insufficient. There must be some mechanism to bind the first _:x to the context of the formula.

The same goes of course for universally quantified variables. Yet, we should be careful, as existentials and universals define different scopes in N3.

{ _:x a :Cat. {_:x :legCount 4.} a :Truth. } a :Truth.
{ ?y  a :Cat. {?y  :legCount 4.} a :Truth. } a :Truth.

The first line defines 2 different animals, the second line only one.

[1] http://www.w3.org/DesignIssues/Reify.html, section "N3 Formulae" [2] http://razor.occams.info/code/semweb/semweb-current/src/N3Writer.cs, function IsFormulaId

This raised a current issue on the support of variable scoping within RDF.rb

Regarding variable scope, there are some conflicts with RDF.rb semantics on the scope of variables. A node == another node (or eql?) iff they are the same (or duped from the same) original node. This is necessary to insure that to nodes with the same identifier aren't confused with each other. This is standard, as two different parses of the same graph must create distinct nodes according to SPARQL semantics. (Actually, this is an issue before the RDF WG right now, which is looking at Skolemization requirements to support, e.g., updates to datasets having nodes). (Also, note that SPARQL::Grammar translates Node patterns to non-distinguished variables for equality to work properly).

In SPARQL (1.0 anyway), variable scope (including existential) is to the query, not to a subset of the query. In RDF.rb, variable equality is treated differently. An unbound variable == (or eql?) any other RDF::Term. If bound, only the names need to be the same.

Changing these semantics for N3 will require something quite different, probably best done using a specific mixin that adds these semantics to particular node/variable instances.

The reader actually associates variables with either the defining scope, or the parent scope for quickvariables. This is necessary so that the following makes sense:

[] a car. { ?x a car } => { ?x a vehicle }.

Looking more at the spec, I do see that existentials are only in scope within the formula in which they're defined. In fact, I think the reader uses the same same scope rules for @forSome as @forAll, and BNodes are common to the entire file. This would be in violation of [1] "Blank Nodes". If you'd like to take a crack at fixing this, and coming up with some RSpecs for this, that would be great.

This is definitely a complex issue, and I could easily be wrong in my interpretation. The only objective guides I'm aware of (other than the sometimes-hard-to-understand specs) are the CWM regression tests [2].

Beyond parsing and serialization, some class would be required within RDF::N3 that would do the equivalent of SPARQL::Algebra and create the dataset, queries and constructs necessary to execute the formulae. BNodes within formulae (at least) should probably be implemented using non-distinguished RDF::Query::Variable instances, rather than RDF::Node to get the proper semantics. After parsing, each formula might look like an RDF::Query that can be executed against the appropriate dataset who's results are used to create a graph for log:implies. This is certainly non-triveal, but that was the way I was imagining doing it.

[1] http://www.w3.org/DesignIssues/Notation3.html [2] http://www.w3.org/2000/10/swap/test/regression.n3

gkellogg commented 13 years ago

See https://github.com/bendiken/rdf/issues/92 for a discussion of required changes to RDF.rb

gkellogg commented 3 years ago

Although not complete, this is largely implement on the develop branch. I'll get a release out soon, although the Community Group spec is still a WIP.