w3c / N3

W3C's Notation 3 (N3) Community Group
48 stars 18 forks source link

Implicit quantification #5

Closed doerthe closed 3 years ago

doerthe commented 5 years ago

When we used the reasoners Cwm and EYE, we discovered, that they differed in their way of handling implicit universal quantification. To give a concrete example, consider the rule:

{{?x :p :o} => {?x :p2 :o2}}=> {:a :b :c}.

Cwm interprets that as:

(∀x: p(x,o) → p2(x,o2)) → b(a,c)

EYE interprets it as:

∀x: (p(x,o) → p2(x,o2)) → b(a,c)

The reason for that the W3C team submission is not really clear here, it says that universals are quantified "on the parent formula" (see https://www.w3.org/TeamSubmission/n3/#Quantifica), but it does not say what exactly this "parent" is.

How do the other reasoners handle this case?

gkellogg commented 5 years ago

It seems unambiguous to me that the {} creates a parent/child relationship, with what's immediately outside formula including the definition defining the scope. So, Cwm is correct.

An interesting use case might be the following:

@forAll :x . 
:A :p3 :o .
:B :p :o .
{?x :p3 :o} => {:a :b :x } .
{{?x :p :o} => {?x :p2 :o2}} => {:a :b :x} .

I think it should result in :B :p2 :o2 . :a :b :A ., not :B :p2 :o2 . :a :b :B .

doerthe commented 5 years ago

OK, so for you the parent is the next higher formula in braces? EYE sees the top formula as parent.

I will give some examples later why I think that Cwm's interpretation is problematic, but I want one example at the time and I do not get yours yet.

I do not get how you would come to your :B :p2 o2., could you explain that?

I also disagree that your example should result in :a :b :A. Here I think you expect the reasoner to treat the explicitly quantified variable :x as the same variable as the implicitly quantified variable ?x but: why should these two be the same and how would the reasoner know?

gkellogg commented 5 years ago

Well, to me, without firm definition, "parent" means immediate ancestor, so the parent of a formula is the formula within which it's contained.

I think the reasoning for this is that if you consider something like {?x :p3 :o} => {:a :b ?x } as an operator the allow for the scope of the variable to include the operator. My implementation turns this into the S-Expression (logImplies (formula (triple ?x :p3 :o)) (formula (triple :a :b ?x)). This allows the variable ?x used in both formula to be the same variable. Promoting it to the root would not support this specific use case.

Regarding the lexical ordering of @forAll/@forSome clauses in the same scope, I drew my conclusions from SWAP examples such as the following:

In these cases, the same variable name is used in different @forAll clauses in the same scope; it may be that they would quantify the same values, but it's not clearly the intention of the specs (such as they are).

The question of if @forAll :x and ?x denote the same variable name may be an implementation concern. We should probably say if the set of all explicit universal variables and implicit universal variables are disjoint or not.

The argument for having separate variables denoted using the same syntax being disjoint is made stronger when you consider importing rules from multiple different files, where variables such as ?x could easily bind to different solutions, unless this is more properly specified.

My example likely has issues, but could be reframed to create a test case to determine how implementations handle the same variable name, both when used in different scopes or in different order within the same scope, are handled.

Perhaps trying something different:

{:a :b ?x} => {:a :c ?x} .
{:d :e ?x} => {:d :f ?x} .
:a :b 10 .
:d :e 11 .

My implementation produces:

:a :b 10;  :c 10 .
:d :e 11; :f 11 .
josd commented 5 years ago

On Wed, Jan 23, 2019 at 6:56 PM Gregg Kellogg notifications@github.com wrote:

Well, to me, without firm definition, "parent" means immediate ancestor, so the parent of a formula is the formula within which it's contained.

I think the reasoning for this is that if you consider something like {?x :p3 :o} => {:a :b ?x } as an operator the allow for the scope of the variable to include the operator. My implementation turns this into the S-Expression (logImplies (formula (triple ?x :p3 :o)) (formula (triple :a :b ?x)). This allows the variable ?x used in both formula to be the same variable. Promoting it to the root would not support this specific use case.

Regarding the lexical ordering of @forAll/@forSome clauses in the same scope, I drew my conclusions from SWAP examples such as the following:

In these cases, the same variable name is used in different @forAll clauses in the same scope; it may be that they would quantify the same values, but it's not clearly the intention of the specs (such as they are).

The question of if @forAll :x and ?x denote the same variable name may be an implementation concern. We should probably say if the set of all explicit universal variables and implicit universal variables are disjoint or not.

The argument for having separate variables denoted using the same syntax being disjoint is made stronger when you consider importing rules from multiple different files, where variables such as ?x could easily bind to different solutions, unless this is more properly specified.

My example likely has issues, but could be reframed to create a test case to determine how implementations handle the same variable name, both when used in different scopes or in different order within the same scope, are handled.

Perhaps trying something different:

{:a :b ?x} => {:a :c ?x} . {:d :e ?x} => {:d :f ?x} . :a :b 10 . :d :e 11 .

My implementation produces:

:a :b 10; :c 10 . :d :e 11; :f 11 .

Nice and same for EYE:

$ echo -e "PREFIX : http://example.org/try#\n\n{:a :b ?x} => {:a :c ?x} .\n{:d :e ?x} => {:d :f ?x} .\n:a :b 10 .\n:d :e 11 ." | eye --nope --n3 - --pass

produces

PREFIX : http://example.org/try#

:a :b 10 . :d :e 11 . :a :c 10 . :d :f 11 .

As for your other example

$ eye --nope --n3 https://www.w3.org/2000/10/swap/test/EricNeumann/metab.n3 --pass

produces

PREFIX bg: http://www.bg.org/bio# PREFIX daml: http://www.daml.org/2001/03/daml+oil# PREFIX rules: http://infomesh.net/2001/05/rdflint/rules# PREFIX log: http://www.w3.org/2000/10/swap/log# PREFIX v: http://infomesh.net/2001/05/rdflint/vars# PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# PREFIX : https://www.w3.org/2000/10/swap/test/EricNeumann/metab.n3#

bg:catalyzes a daml:ObjectProperty. bg:Protein a daml:Class. bg:modification a daml:Property. bg:Enzyme a daml:Class. bg:BioChemical a daml:Class. bg:Compound a daml:Class. bg:Reaction a daml:Class. bg:reactsIn a daml:ObjectProperty. bg:isProducedBy a daml:ObjectProperty. bg:consumes a daml:ObjectProperty. bg:produces a daml:ObjectProperty. bg:metabolizes a daml:ObjectProperty. bg:synthesizes a daml:ObjectProperty. bg:Substrate a daml:Class. bg:Product a daml:Class. bg:isUpStream a daml:TransitiveProperty. bg:isDownStream a daml:TransitiveProperty. :glucose6P a bg:Compound. :fructose6P a bg:Compound. :Glu2Fru a bg:Reaction. :GluIsomerase a bg:Enzyme. :fructose1P a bg:Compound. :Fru62Fru1 a bg:Reaction. :FruPIsomerase a bg:Enzyme. bg:catalyzes daml:minCardinality "1". bg:catalyzes daml:range bg:Reaction. bg:modification daml:range rdfs:Literal. bg:reactsIn daml:range bg:Reaction. bg:isProducedBy daml:range bg:Reaction. bg:consumes daml:range bg:Substrate. bg:produces daml:range bg:Product. bg:isUpStream daml:range bg:BioChemical. bg:isDownStream daml:range bg:BioChemical. bg:Protein daml:hasProperty bg:modification. bg:Enzyme daml:hasProperty bg:catalyzes. bg:BioChemical daml:hasProperty bg:isUpStream. bg:BioChemical daml:hasProperty bg:isDownStream. bg:Reaction daml:hasProperty bg:consumes. bg:Reaction daml:hasProperty bg:produces. bg:Substrate daml:hasProperty bg:reactsIn. bg:Product daml:hasProperty bg:isProducedBy. bg:Protein daml:subClassOf bg:BioChemical. bg:Enzyme daml:subClassOf bg:Protein. bg:Compound daml:subClassOf bg:BioChemical. bg:Substrate daml:subClassOf bg:BioChemical. bg:Product daml:subClassOf bg:BioChemical. bg:consumes daml:inverseOf bg:reactsIn. bg:produces daml:inverseOf bg:isProducedBy. bg:isDownStream daml:inverseOf bg:isUpStream. :glucose6P bg:reactsIn :Glu2Fru. :fructose6P bg:reactsIn :Fru62Fru1. :Glu2Fru bg:produces :fructose6P. :Fru62Fru1 bg:produces :fructose1P. :GluIsomerase bg:catalyzes :Glu2Fru. :FruPIsomerase bg:catalyzes :Fru62Fru1. :GluIsomerase bg:synthesizes :fructose6P. :FruPIsomerase bg:synthesizes :fructose1P.

Kind regards, Jos

-- https://josd.github.io/ http://josd.github.io/

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/N3/issues/5#issuecomment-456902470, or mute the thread https://github.com/notifications/unsubscribe-auth/AALzpPFlWmoA4dz-mbog4JDokfvhEWPxks5vGKI1gaJpZM4Z-Zz9 .

doerthe commented 5 years ago

Thank you for your clarification. Your last example case is much clearer and I would expect the same answer your reasoner gives.

For the two SWAP files you mention: I see why you came to the conclusion that the lexical order matters, but from a logical point of view it does not really matter whether or not you repeat the universal quantifiers of the variables.

@forAll :child, :parent. 

{:child gc:parent :parent. :child :gender :M} log:implies {:parent gc:son :child} . 

{:child gc:parent :parent. :child :gender :F} log:implies {:parent gc:daughter :child}.

should lead to the same results as:

@forAll :child, :parent. {:child gc:parent :parent. :child :gender :M} log:implies {:parent gc:son :child} . 
@forAll :child, :parent. {:child gc:parent :parent. :child :gender :F} log:implies {:parent gc:daughter :child}.

What do you do if you have:

:s :p :o.
@forAll  :x. {:x :p :o}=>{:x :b :c}.
@forSome :x. {:s :p :o}=>{:a :b :x}.

Cwm gave me:

   @forAll :x.
         @forSome :x .

    :a     :b :x .

    :s     :b :c;
         :p :o .
    {
        :s     :p :o .

        }     log:implies {:a     :b :x .
        } .
    {
        :x     :p :o .

        }     log:implies {:x     :b :c .
        } .

Which does not really help since I do not know how :x is quantified. What do you produce? I guess the second quantifier overwrites the first one?

I also have another question: what do you get if you have the following input:

_:x :p {_:x :b :c}.

{?x :p {?x2 :b :c}}=>{:s :p :o}.

I tested with Cwm and EYE and saw that EYE derives :s :p :o. while Cwm doesn't and the reason for that is the scoping issue I mentioned above. So, what is your output here?

gkellogg commented 5 years ago

@doerthe said:

What do you do if you have:

:s :p :o. @forAll :x. {:x :p :o}=>{:x :b :c}. @forSome :x. {:s :p :o}=>{:a :b :x}.

I get the following data produced:

:s :b :c; :p :o .

So, I do not produce :a :b :x . I may not be interpreting variable scope properly, but in my case, the second :x, which is existentialus used for the implication, and as it is unbound, it produces no triple. This is actually consistent with how I would deal with multiple@forAll`, as my interpretation of lexical scope introduces a new variable, which would still be unbound.

I also have another question: what do you get if you have the following input:

_:x :p {_:x :b :c}. {?x :p {?x2 :b :c}}=>{:s :p :o}.

I don't handle blank node scoping properly yet. But, it seems to me that ?x and ?x2 would bind to different nodes, but wouldn't be used in the implication, so the result should derive {:s :p :o}. If ?x were used instead of ?x2 it would not generate that triple, as the two nodes are different.

Alternatively, the _:x used within the formula is treated as an existential variable and is not bound as it matches no triple in the dataset. That would mean that ?x2 would not bind to anything, and no triple would be emitted. This also depends on what the interpretation of existential variable is; my implementation treats both universal and existential variables largely the same way from a graph pattern matching perspective, which is consistent with their treatment in SPARQL.

doerthe commented 5 years ago

@doerthe said:

What do you do if you have:

:s :p :o. @forAll :x. {:x :p :o}=>{:x :b :c}. @forSome :x. {:s :p :o}=>{:a :b :x}.

I get the following data produced:

:s :b :c; :p :o .

So, I do not produce :a :b :x . I may not be interpreting variable scope properly, but in my case, the second :x, which is existential`us used for the implication, and as it is unbound, it produces no triple.

I do not get why it is unbound, could you explain? Isn't it bound by the existential quantifier in front of the triple. I would have expected based on what you told me so far that you'd produce:

:a :b _:x.

I also have another question: what do you get if you have the following input:

_:x :p {_:x :b :c}. {?x :p {?x2 :b :c}}=>{:s :p :o}.

I don't handle blank node scoping properly yet. But, it seems to me that ?x and ?x2 would bind to different nodes, but wouldn't be used in the implication, so the result should derive {:s :p :o}. If ?x were used instead of ?x2 it would not generate that triple, as the two nodes are different.

Without blank nodes we can still have a similar example, so what do you do for:

:x :p {:y :b :c}.
{?x :p {?x2 :b :c}}=>{:s :p :o}.

Based on your answer above I would expect that you'd expect a reasoner to derive :s :p :o. Note, that this can only be derived if we follow the scoping as implemented in EYE but not with Cwm:

EYE understands the rule as:

∀x:∀x_2: p(x, b(x_2,c)) → p(s,o)

This allows the reasoner to unify the antecedent of the rule with the triple and come to the result :s :p :o.

With Cwms's interpretation this cannot be derived, the reasoner understands the rule as:

∀x:: p(x, (∀x_2: b(x_2,c))) → p(s,o)

Since in the triple :x :p {:y :b :c}. a specific :y is given while the rule addresses all x_2 the rule does not fire.

Cwm would derive :s :p :o. if we had the triple:

:x :p {?y :b :c}.

gkellogg commented 5 years ago

@doerthe said:

I do not get why it is unbound, could you explain? Isn't it bound by the existential quantifier in front of the triple. In the following:

:s :p :o.
@forAll :x. {:x :p :o}=>{:x :b :c}.
@forSome :x. {:s :p :o}=>{:a :b :x}.

The first implication binds the universal quantifier :x to :s, so that the antecedent creates a solution :x => :s. This result set is then used in the consequent to yield :s :b :c.

The second implication redefines :x as an existential quantifier, initially unbound. The antecedent does not bind this variable, so there is no solution passed to the consequent. The antecedent is true, so the consequent is evaluated, but there is no value matched for :x. There may be an interpretation of existential quantifiers that effectively creates a blank node for :x so that :a :b [] would result, but I couldn't find any SWAP use cases to support this, and it's not clear from the submission (to me, at least).

Without blank nodes we can still have a similar example, so what do you do for:

:x :p {:y :b :c}. {?x :p {?x2 :b :c}}=>{:s :p :o}.

It's not the blank nodes that I don't support, but the formula as part of the dataset. i have since updated my code to treat blank nodes as unique to each formula.

Based on your answer above I would expect that you'd expect a reasoner to derive :s :p :o. Note, that this can only be derived if we follow the scoping as implemented in EYE but not with Cwm

Here we get to what is meant by variable scope. My understanding, is that variable scope relates to their visibility within formulae, not the dataset. My implementation currently only reasons over data in the default graph, but I think it would be consistent to say that the pattern {?x :p {?x2 :b :c}} binds ?x in the default graph, and ?x2 in the graph named by the blank node implicitly defined as the object of :x :p {..}. So, the dataset visible to ?x is effectively the quad :x :p _:f, binding ?x to _:f, and the dataset visible to ?x2 is effectively the quad :y :b :c _:f, binding ?x2 to :y.

The inner most formula has solutions, which allow the outermost formula to bind to the blank node for the named graph having that solution. The outermost formula therefore also has a solution, so the consequent {:s :p :o} is evaluate and results in that triple being added to the outermost (default) graph.

The other thing I've noticed about variables is if they bind in one formula, those bindings are not in effect in another formula at the same level. I discovered this when working on unify5:

@keywords is, a .

fred siblings  (  [ parents (Zeus Juno), (Alice Bob), (Jane Joe) ]  Aphrodite ).
{ fred siblings (  []   ?x ) } =>   { ?x  a ShouldBeAphrodite }.
{ fred siblings (  ?y   ?x ) } =>   { ?y  a BnodeExtractedError }.
{ fred siblings  (  [ parents ([] ?y) ]  ?x ) } => { ?x   a ShouldBeAphrodite2.  ?y a ShouldBeJunoBobJoe }.

In the second formula, ?y binds to [parents (Zeus Juno), ...]. If that binding took hold in third formula, ?y would also bind to Zeus, Alice, and Jane; because ?y is already bound, these solutions would be inconsistent and processing would be order dependent. Consequently, my interpretation is the ?y is evaluated separately for each formula, which is consistent with the anticipated results. Any binding of ?y would last in the formula, and sub-formulae for which it is defined, and between the antecedent and consequent, but not between separate statements. This seems inconsistent with the statement that ?y binds to the parent of the formula in which it defined, as it is apparent that each implication pair treated these are being separate.

CWM produces the following result:

:Aphrodite     a :ShouldBeAphrodite, :ShouldBeAphrodite2 .
:Bob     a :ShouldBeJunoBobJoe .
:Joe     a :ShouldBeJunoBobJoe .
:Juno     a :ShouldBeJunoBobJoe .
:fred     :siblings  (
  [a :BnodeExtractedError; :parents  (:Alice :Bob ), (:Jane Joe ), ( :Zeus :Juno ) ]
  :Aphrodite
) .
doerthe commented 5 years ago

I do not get why it is unbound, could you explain? Isn't it bound by the existential quantifier in front of the triple. In the following:

:s :p :o.
@forAll :x. {:x :p :o}=>{:x :b :c}.
@forSome :x. {:s :p :o}=>{:a :b :x}.

The first implication binds the universal quantifier :x to :s, so that the antecedent creates a solution :x => :s. This result set is then used in the consequent to yield :s :b :c.

The second implication redefines :x as an existential quantifier, initially unbound. The antecedent does not bind this variable, so there is no solution passed to the consequent. The antecedent is true, so the consequent is evaluated, but there is no value matched for :x. There may be an interpretation of existential quantifiers that effectively creates a blank node for :x so that :a :b [] would result, but I couldn't find any SWAP use cases to support this, and it's not clear from the submission (to me, at least).

I see here again the problem that you understand rules like if-else statements in imperative programming if you say:

The antecedent is true, so the consequent is evaluated.

Here, we deal with logical rules, so what we actually say by a rule is that if the antecedent is true, so is the consequence. Note furthermore, that :x is not redefined as an existential quantifier here, it just results in an existentiallty quantified variable. To clarify what happens without the problem that we have the :x twice in that rule, try to give the following input to Cwm:

@prefix : <http://www.example.org/ex#>.  

:s :p :o.
 @forAll :x. {:x :p :o}=>{:x :b :c}.
 @forSome :y. {:s :p :o}=>{:a :b :y}.

Cwm derives:

     @prefix : <http://www.example.org/ex#> .
    @prefix log: <http://www.w3.org/2000/10/swap/log#> .

     @forAll :x.
         @forSome :y .

    :a     :b :y .

    :s     :b :c;
         :p :o .
    {
        :s     :p :o .

        }     => {:a     :b :y .
        } .
    {
        :x     :p :o .

        }     => {:x     :b :c .
        } .

The important part here is that we get:

@forSome :y . :a :b :y .

According to the w3c team submission this is the same as:

:a :b [] .

Without blank nodes we can still have a similar example, so what do you do for:

:x :p {:y :b :c}. {?x :p {?x2 :b :c}}=>{:s :p :o}.

It's not the blank nodes that I don't support, but the formula as part of the dataset. i have since updated my code to treat blank nodes as unique to each formula.

Based on your answer above I would expect that you'd expect a reasoner to derive :s :p :o. Note, that this can only be derived if we follow the scoping as implemented in EYE but not with Cwm

Here we get to what is meant by variable scope. My understanding, is that variable scope relates to their visibility within formulae, not the dataset. My implementation currently only reasons over data in the default graph, but I think it would be consistent to say that the pattern {?x :p {?x2 :b :c}} binds ?x in the default graph, and ?x2 in the graph named by the blank node implicitly defined as the object of :x :p {..}. So, the dataset visible to ?x is effectively the quad :x :p _:f, binding ?x to _:f, and the dataset visible to ?x2 is effectively the quad :y :b :c _:f, binding ?x2 to :y.

The inner most formula has solutions, which allow the outermost formula to bind to the blank node for the named graph having that solution. The outermost formula therefore also has a solution, so the consequent {:s :p :o} is evaluate and results in that triple being added to the outermost (default) graph.

Note that we deal with two different things here, the scope and the position of the quantifier. If you understand a SPARQL query as a rule, you would still only get the bindings as you describe them if the position of the quantifier is outside of the whole formula. So, the rule

@forAll :x, :x2. {:x :p {:x2 :b :c}}=>{:s :p :o}.

will lead to :s :p :o. with :x :p {:y :b :c}. as input and that is also how eye interprets the formula.

But for cwm, the formula: {?x :p {?x2 :b :c}}=>{:s :p :o}. means

@forAll :x. {@forAll :x2. :x :p {:x2 :b :c}}=>{:s :p :o}.

as the formula is quantified on what cwm sees as "the parent". So, to stay in your graph example, the rule would only "fire" if in your subgraph we would either have the triple :x2 :p :o. for every possible :x2 of your whole universe (which is rather unlikely) or if we would have stated somewhere that

@forAll :x2. :x :p {:x2 :b :c}.

The other thing I've noticed about variables is if they bind in one formula, those bindings are not in effect in another formula at the same level. I discovered this when working on unify5:

@keywords is, a .

fred siblings  (  [ parents (Zeus Juno), (Alice Bob), (Jane Joe) ]  Aphrodite ).
{ fred siblings (  []   ?x ) } =>   { ?x  a ShouldBeAphrodite }.
{ fred siblings (  ?y   ?x ) } =>   { ?y  a BnodeExtractedError }.
{ fred siblings  (  [ parents ([] ?y) ]  ?x ) } => { ?x   a ShouldBeAphrodite2.  ?y a ShouldBeJunoBobJoe }.

In the second formula, ?y binds to [parents (Zeus Juno), ...]. If that binding took hold in third formula, ?y would also bind to Zeus, Alice, and Jane; because ?y is already bound, these solutions would be inconsistent and processing would be order dependent. Consequently, my interpretation is the ?y is evaluated separately for each formula, which is consistent with the anticipated results. Any binding of ?y would last in the formula, and sub-formulae for which it is defined, and between the antecedent and consequent, but not between separate statements. This seems inconsistent with the statement that ?y binds to the parent of the formula in which it defined, as it is apparent that each implication pair treated these are being separate.

Please note, the you deal with logical rules here. There is no reason why, if a triple fulfils the antecedent of one rule which therefore "fires" the antecedent of another rule would also be fulfiled by simply using the same binding we had for the first rule. Since I find the siblings rather confusing (more to refer to, not the example as such), I give you an easier example. Following the rule Jos had in the other issue, consider the two rules:

{?x :madeOf :stone} => {?x :a :StoneObject}.
{?x :madeOf :cheese} => {?x :canBe :eaten}.

These means: "If an instance is made of stone, it is a :StoneObject and if it is made of cheese, it can be eaten." Now, consider the triple (and ignore my ignorance conserning the material of the moon ;) ): :moon :madeOf :stone.

The rules above can both be instantiated by binding ?x to :moon. We get:

{:moon :madeOf :stone} => {:moon :a :StoneObject}.
{:moon :madeOf :cheese} => {:moon :canBe :eaten}.

Both of these rules are true: If the moon is made out of stone, it is a stone object and if the moon is made of cheese, it can be eaten. Now we have some knowledge, namely:

:moon :madeOf :stone.

We can thus conclude:

:moon :a :StoneObject.

For the antecedent of the second rule we have no evidence, we therefore do not know whether or not the moon is made of cheese (cheese and stone could still be the same), we cannot derive anything from our triple using the second rule.

To conclude: there are two steps, the binding and the application of the rule. You can bind the universal variables in your example with any instance of the universe, but that does not mean that you can apply the rule. You can only apply the rule, if you find evidence for the antecedent, ie you either have the concrete triples present in your bound version of the rule stated in your database or you can derive these triples by using other rules.

gkellogg commented 5 years ago

I see here again the problem that you understand rules like if-else statements in imperative programming if you say:

The antecedent is true, so the consequent is evaluated.

Here, we deal with logical rules, so what we actually say by a rule is that if the antecedent is true, so is the consequence. Note furthermore, that :x is not redefined as an existential quantifier here, it just results in an existentiallty quantified variable. To clarify what happens without the problem that we have the :x twice in that rule, ...

I actually don't get that output from CWM using cwm.py --think is there some other argument I'm missing?

The important part here is that we get:

@forSome :y . :a :b :y .

According to the w3c team submission this is the same as:

:a :b [] .

Great, thanks for helping my understanding. It makes sense for existential quantifiers that the consequent simply emits the quantifier as part of the output.

However, if :y were "bound" in some outer context, I would think it would generate triples with the bound values. For example:

:s :p :o.
 @forAll :x. {:x :p :o}=>{:x :b :c}.
 @forSome :y. {:s :p :y}=>{:a :b :y}.

would then generate

:s :p :o, :b :c .
:a :b :o .

...

Note that we deal with two different things here, the scope and the position of the quantifier. If you understand a SPARQL query as a rule, you would still only get the bindings as you describe them if the position of the quantifier is outside of the whole formula. So, the rule

@forAll :x, :x2. {:x :p {:x2 :b :c}}=>{:s :p :o}.

will lead to :s :p :o. with :x :p {:y :b :c}. as input and that is also how eye interprets the formula.

But for cwm, the formula: {?x :p {?x2 :b :c}}=>{:s :p :o}. means

@forAll :x. {@forAll :x2. :x :p {:x2 :b :c}}=>{:s :p :o}.

as the formula is quantified on what cwm sees as "the parent". So, to stay in your graph example, the rule would only "fire" if in your subgraph we would either have the triple :x2 :p :o. for every possible :x2 of your whole universe (which is rather unlikely) or if we would have stated somewhere that

@forAll :x2. :x :p {:x2 :b :c}.

I have to think about this, and how to interpret it in light of the description of log:implies:

Logical implication. This is the relation between the antecedent (subject) and conclusion (object) of a rule. The application of a rule to a knowledge-base is as follows. For every substitution which, applied to the antecedent, gives a formula which is a subset of the knowledge-base, then the result of applying that same substitution to the conclusion may be added to the knowledge-base. related: See log:conclusion. (See the CWM manual for command line options to determine how rules from different sources are applied to and the results added to various formula.)

What is the "knowledge-base" against which :x2 is substituted? I've taken it to mean that the knowledge base is the outermost formula/graph, or the named graph bound by :x. For me, this does ultimately come down to understanding how to organize the data which is to be queried against a pattern to find results that match. In this case, there is no implication involved, so it's a matter of determining if there are any (or, probably if all) triples in the knowledge-base established by doing the substitution for :x are of the form :x2 :b :c.

...

Please note, the you deal with logical rules here. There is no reason why, if a triple fulfils the antecedent of one rule which therefore "fires" the antecedent of another rule would also be fulfiled by simply using the same binding we had for the first rule. Since I find the siblings rather confusing (more to refer to, not the example as such), I give you an easier example. Following the rule Jos had in the other issue, consider the two rules:

{?x :madeOf :stone} => {?x :a :StoneObject}.
{?x :madeOf :cheese} => {?x :canBe :eaten}.

These means: "If an instance is made of stone, it is a :StoneObject and if it is made of cheese, it can be eaten." Now, consider the triple (and ignore my ignorance conserning the material of the moon ;) ): :moon :madeOf :stone.

The rules above can both be instantiated by binding ?x to :moon. We get:

{:moon :madeOf :stone} => {:moon :a :StoneObject}.
{:moon :madeOf :cheese} => {:moon :canBe :eaten}.

Both of these rules are true: If the moon is made out of stone, it is a stone object and if the moon is made of cheese, it can be eaten. Now we have some knowledge, namely:

:moon :madeOf :stone.

We can thus conclude:

:moon :a :StoneObject.

For the antecedent of the second rule we have no evidence, we therefore do not know whether or not the moon is made of cheese (cheese and stone could still be the same), we cannot derive anything from our triple using the second rule.

To conclude: there are two steps, the binding and the application of the rule. You can bind the universal variables in your example with any instance of the universe, but that does not mean that you can apply the rule. You can only apply the rule, if you find evidence for the antecedent, ie you either have the concrete triples present in your bound version of the rule stated in your database or you can derive these triples by using other rules.

This makes sense to me know, in light of your clarification on the level at which a variable is introduced. Thanks for the great detailed explanations.

doerthe commented 5 years ago

I see here again the problem that you understand rules like if-else statements in imperative programming if you say:

The antecedent is true, so the consequent is evaluated.

Here, we deal with logical rules, so what we actually say by a rule is that if the antecedent is true, so is the consequence. Note furthermore, that :x is not redefined as an existential quantifier here, it just results in an existentiallty quantified variable. To clarify what happens without the problem that we have the :x twice in that rule, ...

I actually don't get that output from CWM using cwm.py --think is there some other argument I'm missing?

This is indeed strange. I just used the input I listed together with --think and got the output I also posted. Maybe we have different versions of Cwm? I have v 1.197 2007/12/13 15:38:39 syosi.

The important part here is that we get: @forSome :y . :a :b :y . According to the w3c team submission this is the same as: :a :b [] .

Great, thanks for helping my understanding. It makes sense for existential quantifiers that the consequent simply emits the quantifier as part of the output.

However, if :y were "bound" in some outer context, I would think it would generate triples with the bound values. For example:

:s :p :o.
 @forAll :x. {:x :p :o}=>{:x :b :c}.
 @forSome :y. {:s :p :y}=>{:a :b :y}.

would then generate

:s :p :o, :b :c .
:a :b :o .

Please note that there is a difference between @forAll and @forSome.

The binding you describe can only happen for a rule containing a variable which is quantified with @forAll (the :x in the example). This rule translates to

"For all x in the universe the rule {:x :p :o}=>{:x :b :c}. is valid."

:s is such an element of the universe, so we know that the rule

{:s :p :o}=>{:s :b :c}.

is valid. This rule can be applied if we encounter the triple :s :p :o. We then get :s :b :c.

In the case of a variable used together with @forSome we cannot bind. The rule @forSome :y. {:s :p :y}=>{:a :b :y}.from above just says that:

"There is some (note not all, only some) :y for which the rule {:s :p :y}=>{:a :b :y}. is correct. "

We cannot apply the rule @forSome :y. {:s :p :y}=>{:a :b :y}. to :s :p :o because we only know that there is a :y for which that rule is correct, but we don't know what that value is, so we don't know if it is :o or not.

Note that we deal with two different things here, the scope and the position of the quantifier. If you understand a SPARQL query as a rule, you would still only get the bindings as you describe them if the position of the quantifier is outside of the whole formula. So, the rule

@forAll :x, :x2. {:x :p {:x2 :b :c}}=>{:s :p :o}.

will lead to :s :p :o. with :x :p {:y :b :c}. as input and that is also how eye interprets the formula. But for cwm, the formula: {?x :p {?x2 :b :c}}=>{:s :p :o}. means

@forAll :x. {@forAll :x2. :x :p {:x2 :b :c}}=>{:s :p :o}.

as the formula is quantified on what cwm sees as "the parent". So, to stay in your graph example, the rule would only "fire" if in your subgraph we would either have the triple :x2 :p :o. for every possible :x2 of your whole universe (which is rather unlikely) or if we would have stated somewhere that @forAll :x2. :x :p {:x2 :b :c}.

I have to think about this, and how to interpret it in light of the description of log:implies:

Logical implication. This is the relation between the antecedent (subject) and conclusion (object) of a rule. The application of a rule to a knowledge-base is as follows. For every substitution which, applied to the antecedent, gives a formula which is a subset of the knowledge-base, then the result of applying that same substitution to the conclusion may be added to the knowledge-base. related: See log:conclusion. (See the CWM manual for command line options to determine how rules from different sources are applied to and the results added to various formula.)

What is the "knowledge-base" against which :x2 is substituted? I've taken it to mean that the knowledge base is the outermost formula/graph, or the named graph bound by :x. For me, this does ultimately come down to understanding how to organize the data which is to be queried against a pattern to find results that match. In this case, there is no implication involved, so it's a matter of determining if there are any (or, probably if all) triples in the knowledge-base established by doing the substitution for :x are of the form :x2 :b :c.

In the example, the :x2 does not directly occur in the antecedent of a rule. It is nested and quantified You would need something like @forAll :z. :x :p {:z :b :c}. in your outer-most knowledge-base.

In general I think it is also problematic to see the logical implication just as a simple relationship since I think a logical operator is more than that (something like that was also briefly mentioned before in the mailing list). I would also like to separate the semantic view (what does it mean) from the operational view (what does the reasoner do?) here and first concentrate on the logical meaning of an implication. But that just as a side remark.

william-vw commented 5 years ago

It seems like we've gotten a bit off track here, talking e.g., about existing implementation behaviors and interpretations of different logic operators. Although these side discussions have certainly been useful, it may become a bit daunting for others to get a good view of where we stand on this issue.

If you feel that we missed anything please feel free to comment.

To summarize, I believe that we've arrived at the following (some copied from previous posts, others from personal discussions):

The shorthand syntax “?x” implies that x is universally quantified not in the formula but in its parent formula. Firstly, is this a good thing; and secondly, what exactly should this "parent" be?

@doerthe: To give a concrete example, consider the rule:

{{?x :p :o} => {?x :p2 :o2}}=> {:a :b :c}.

Cwm interprets that as:

(∀x: p(x,o) → p2(x,o2)) → b(a,c)

EYE interprets it as:

∀x: (p(x,o) → p2(x,o2)) → b(a,c)

@gregg: it seems unambiguous to me that the {} creates a parent/child relationship, with what's immediately outside formula including the definition defining the scope.

EDIT it seems that scoping at the outermost level would be most appropriate:

@gkellogg However, it may become more practical to scope the universal variables at the outermost level, as EYE apparently does, and this would clear up this kind of confusion.

@doerthe I also think that it is simply more practical to assume the universal quantification on top level.

Quantification in the parent formula instead of the formula itself, has consequences for the semantics of the formula. For instance, take the simplified example:

{?x :p :o} => {:y :p2 :o2}

Quantification inside the same formula:

(∀x: p(x,o)) → p2(y,o2)

Quantification in a parent formula:

∀x: p(x,o) → p2(y,o2)

In the first formula, x is scoped only to the rule premise, meaning that only when all x in the universe adhere to the condition will the consequent also be valid (i.e., the consequent holds if the condition is met by all instances). The second formula states that, for all x in the universe, in case the condition holds for any x, the consequent will also be valid for that x.

In light of this, it does seem to make sense to me to quantify on the parent formula for logical implications.

Further, I greatly appreciate that a "simpler" syntax is available for dealing with universal quantification (i.e., "?x").

It should be straightforward to encode many practical use cases using this notation; this allows developers to start coding with this simpler syntax and later move on to explicit quantifiers (when or if needed) and all the complexity that brings.

@doerthe gives a good case for when this could go wrong however:

the reason why quantification on the parent formula is so problematic is that you can change the meaning of a formula by simply adding another one somewhere else in you N3 document. I do not expect that people are careful with the variables they use. So, for cwm,

{{?x :p :o} => {?x :p2 :o2}}=>{:s :p :o}.

means

(∀x: p(x,o) → p2(x,o2))→p(s,o)

but: by only adding one new rule somewhere in the document which contains the same universal variable on a higher level this changes.

{{?x :p :o} => {?x :p2 :o2}}=>{:s :p :o}. 
{?x :h :l} => {:j :h :l}.

means

∀x: 
( p(x,o) → p2(x,o2))→p(s,o)
h(x,l)→h(j,l)

This makes scoping very unstable.

EDIT How to cope with a variable being introduced at a higher level in a formula, and then that variable being differently quantified "lower down" in that formula? I.e., how to cope with variable name clashes?

Would these be different variables with the same name in different scopes (as I think was suggested by @gkellogg but I could be wrong)

EDIT bnode identifiers: RDF vs N3

@doerthe I think you refer to named graphs and the fact that in

_:x a :Person.

{_:x :likes :ice-cream} the two _:x refer to the same object while in N3 _:x a :Person. :says {_:x :likes :ice-cream} these two _:x are different. I think that we should follow N3's solution here simply to be able to refer to a local blank node in a graph. Even without explicit quantification, skolemisation could bring us to rdf's interpretation, but there is no way from rdf to N3 here. please remember that we do not simply talk about different variables here but also about different positions of quantifiers. If we come back to my example from above, the following graph :says {_:x :likes :ice-cream} means in "pseudo" first order logic: says (graph7, (∃x: likes(x, ice-cream)). while the existential quantifier for TriG is outside the graph. If I try to write it down somehow first-order logic like, I could say: ∃x: namedGraph(graph7, ( likes(x, ice-cream)).

There were other discussions as well, e.g., on differences between operational and semantic views (scoping of variable bindings) (starting from here) and the application of quantifiers (see here).

gkellogg commented 5 years ago

@Gregg: it seems unambiguous to me that the {} creates a parent/child relationship, with what's immediately outside formula including the definition defining the scope.

  • Quantifying in the same or "parent" formula?

Quantification in the parent formula instead of the formula itself, has consequences for the semantics of the formula. For instance, take the simplified example:

{?x :p :o} => {:y :p2 :o2}

Quantification inside the same formula:

(∀x: p(x,o)) → p2(y,o2)

Quantification in a parent formula:

∀x: p(x,o) → p2(y,o2)

In the first formula, x is scoped only to the rule premise, meaning that only when all x in the universe adhere to the condition will the consequent also be valid (i.e., the consequent holds if the condition is met by all instances). The second formula states that, for all x in the universe, in case the condition holds for any x, the consequent will also be valid for that x.

In light of this, it does seem to make sense to me to quantify on the parent formula for logical implications.

Further, I greatly appreciate that a "simpler" syntax is available for dealing with universal quantification (i.e., "?x").

It should be straightforward to encode many practical use cases using this notation; this allows developers to start coding with this simpler syntax and later move on to explicit quantifiers (when or if needed) and all the complexity that brings.

I think my statement was misinterpreted. In my view, the meaning of {{?x :p :o} => {?x :p2 :o2}}=> {:a :b :c}. is the same as (∀x: p(x,o) → p2(x,o2)) → b(a,c). As ?x is introduced in the inner formula, the variable is scoped to the containing formula, the parent of the formula which introduced ?.

but: by only adding one new rule somewhere in the document which contains the same universal variable on a higher level this changes.

{{?x :p :o} => {?x :p2 :o2}}=>{:s :p :o}. 
{?x :h :l} => {:j :h :l}.

means

∀x: 
( p(x,o) → p2(x,o2))→p(s,o)
h(x,l)→h(j,l)

I'm not sure that it does, I think this could be interpreted that there are two variables with the same name introduced in different scopes.

However, it may become more practical to scope the universal variables at the outermost level, as EYE apparently does, and this would clear up this kind of confusion. If we ever want to separate the reasoning from N3, and define it more broadly over an extension to RDF 1.1 datasets, we might introduce universal and existential variables as resources, similar to IRIs and Bland Nodes. Defining a scoping rule for variables to formulae/graphs would be out of keeping with the typical triple/quad representation used in many implementations. My own implementation of N3 reasoning (such as it is) does just this, and parses the N3 into an extended dataset. The scope of equivalently named variables and blank nodes can be handled within the parser. I internally produce unique names for variables and blank node labels to enforce the rules defined for N3.

william-vw commented 5 years ago

@gkellogg

However, it may become more practical to scope the universal variables at the outermost level, as EYE apparently does, and this would clear up this kind of confusion.

For implicit quantification (i.e., ?x) I would tend to agree since it covers a behavior that imo is quite typical for implications - i.e., for all x in the universe, in case the condition holds for any x, the consequent will also be valid for that x.

To your point, putting the quantification on the direct parent formula will likely lead to confusion when one starts nesting formulae in the condition ..

Although this may introduce variable name clashes (as @doerthe says), they also seems like a more general problem that is not limited to implicit quantification (?) How do we deal with a variable being introduced at a higher level in a formula, and then that variable being differently quantified "lower down" in that formula (i.e., in a nested clause)?

Defining a scoping rule for variables to formulae/graphs would be out of keeping with the typical triple/quad representation used in many implementations. My own implementation of N3 reasoning (such as it is) does just this, and parses the N3 into an extended dataset. The scope of equivalently named variables and blank nodes can be handled within the parser. I internally produce unique names for variables and blank node labels to enforce the rules defined for N3.

For sure, a parser could certainly apply the N3 variable scoping rules and consequently generate unique names to differentiate them. It would be interesting to know what policy your parser applies to generate these unique names for variables.

gkellogg commented 5 years ago

It would be interesting to know what policy your parser applies to generate these unique names for variables.

In my case, I create an identifier applied to every bnode/variable along with a sequence, which increments with each distinct bnode/variable. It could just as well be a UUID, but this does help debugging somewhat. A spec, clearly, should not specify at this level of detail.

One challenging thing for general RDF Dataset use is that RDF Concepts says that Blank node identifiers are locally scoped to the store, and not to the graph within which they are contained, as N3 dictates. Pat Hayes recently suggested a more explicit scoping mechanism (also in w3c/EasierRDF#19), which would define some syntactic structure for constraining the scope of bnode labels. Pat proposed the following:

Putting brackets around an entire graph says, in effect, that all bnodeIDs in this graph are local to the graph: omitting them allows the possibility of sharing a bnode with some other graph (as in RDF datasets).

A better system, which would allow for more elaborate structures, would be to have convention of labelled scope brackets of the form [ID ], where ID is any alphanumeric string, which is understood to ‘bind’ only bnodes with ids of the form :string where ID is an initial substring of string. So for example [A ] binds :A1 and :A17 but not :B1. This would allow the full expressiveness of nested quantification without very much extra work at all, and again it could be simply ignored by current RDF engines without harm, although they might be missing out on some of the meaning being expressed by this more elaborate notation. And if you leave out the ID, then this defaults to the simpler notation in the previous paragraph, so bc is automatic.

...

This could be used to hide the internal strcuture of RDF lists:

[L 
_:a rdf:first x:A .
_:a rdf:rest _:Lb .
_:Lb rdf:first x:B.
_:Lb rdf:rest rdf:nil .
]

could be abbreviated as something like {x:A,x:B} and this treated like a new kind of RDF name, which of course becomes the first bnodeID (_:a) when compiled into RDF triples (which is why that bnodeID is not included in the scope, so it can act as the ‘name' of the list elsewhere in the graph.)

In any case, these are syntactic constructs and don't affect the underlying data model, but such syntactic constructs would need an expression in the different RDF serializations (e.g., JSON-LD), not just the Turtle/N3 family.

doerthe commented 5 years ago

I also think that it is simply more practical to assume the universal quantification on top level. For the other cases we can still use explicit quantification if we really need this expressiveness (I am actually not sure here). An the notation Pat Hayes introduced also goes into that direction.

@gkellogg Just to be sure, can you create a small example for the below:

One challenging thing for general RDF Dataset use is that RDF Concepts says that Blank node identifiers are locally scoped to the store, and not to the graph within which they are contained, as N3 dictates.

I think you refer to named graphs and the fact that in

_:x a :Person.
<graph7> {_:x :likes :ice-cream}

the two _:x refer to the same object while in N3

_:x a :Person.
<graph7> :says {_:x :likes :ice-cream}

these two _:x are different.

I think that we should follow N3's solution here simply to be able to refer to a local blank node in a graph. Even without explicit quantification, skolemisation could bring us to rdf's interpretation, but there is no way from rdf to N3 here.

gkellogg commented 5 years ago

I also think that it is simply more practical to assume the universal quantification on top level. For the other cases we can still use explicit quantification if we really need this expressiveness (I am actually not sure here). An the notation Pat Hayes introduced also goes into that direction.

@gkellogg Just to be sure, can you create a small example for the below:

One challenging thing for general RDF Dataset use is that RDF Concepts says that Blank node identifiers are locally scoped to the store, and not to the graph within which they are contained, as N3 dictates.

I think you refer to named graphs and the fact that in

_:x a :Person.
<graph7> {_:x :likes :ice-cream}

the two _:x refer to the same object while in N3

_:x a :Person.
<graph7> :says {_:x :likes :ice-cream}

these two _:x are different.

👍

I think that we should follow N3's solution here simply to be able to refer to a local blank node in a graph. Even without explicit quantification, skolemisation could bring us to rdf's interpretation, but there is no way from rdf to N3 here.

I wasn't suggesting changing N3's scoping rules, but it is a consideration when considering reasoning from other serializations. Of course, this can be solved at the syntax level by just not reusing blank node identifiers that should be distinct.

doerthe commented 5 years ago

I wasn't suggesting changing N3's scoping rules, but it is a consideration when considering reasoning from other serializations. Of course, this can be solved at the syntax level by just not reusing blank node identifiers that should be distinct.

When I started working with N3 I thought so too, but please remember that we do not simply talk about different variables here but also about different positions of quantifiers.

If we come back to my example from above, the following graph

<graph7> :says {_:x :likes :ice-cream}

means in "pseudo" first order logic:

says (graph7, (∃x: likes(x, ice-cream)).

while the existential quantifier for TriG is outside the graph. If I try to write it down somehow first-order logic like, I could say:

∃x: namedGraph(graph7, ( likes(x, ice-cream)).

It really makes a difference where the existential quantifier is located.

gkellogg commented 5 years ago

Not sure how to reconcile that, but if we want to put forth a solution that works generally for RDF datasets, we'll need to figure this out. Otherwise, remain stuck in an N3 backwater with no way forward to reconcile with RDF.

william-vw commented 4 years ago

Same here as with #6 - I believe @doerthe and @josd talked to the designers of N3, and got some insight into what exactly constitutes the "parent" formula in the original N3, and why. Here, I think we were closer to a concrete decision - i.e., the parent formula being the outermost level.

Perhaps they could explain it here :-) .

doerthe commented 4 years ago

You are right, we asked TBL why he choose the concept o the parent the way he did. His reason was that he wanted to treat cited graphs as formulas you read from another file. If you directly have something like {?x :p :o}=>{?x :q :o}. in your dataset and then cite the dataset, you would still like the formula to mean something like ∀x: p(x,o)-> q(x,o). His argument makes sense but we still have the problem that the scoping depends on the context if we go further here. Think of a formula like {{?x :p ?y}=>{:s :p :o}}=>{?x :b :c}. Here, the nested x in the premise is quantified on top level while y is quantified in the antecedent of the rule. It is because of these kinds of situations that the semantics so far assumes quantification on top level and I also understood that this was the consensus so far but I am not sure whether we officially decided that on a meeting. I also dislike to go against the initial idea of Tim, but I think the nested parent is far to complicated to implement and to understand for "normal" users, please let me know if you think I am mistaken :)

william-vw commented 4 years ago

Just to clarify, in TBL’s view the parent formula would have been the lexical “parent” (i.e., containing) formula? Since citing any (external) formula would then be neatly scoped on its (new) parent formula? Not understanding the reason fully here. Maybe scenarios like your second example were not considered?

Regardless, I think using the top (outer) scope makes a lot of sense to me. As said, I think we were pretty close to a consensus here.

william-vw commented 3 years ago

I believe we agreed some time ago on outermost-level scope.