w3c / rdf-star

RDF-star specification
https://w3c.github.io/rdf-star/
Other
120 stars 23 forks source link

Do you need referential opacity? #22

Open pchampin opened 4 years ago

pchampin commented 4 years ago

This issue is intended as a strawpoll to determine if referential opacity is a required feature of RDF, because it has several implications, especially on the ability or not to "encode" RDF in standard RDF.

The problem

The question boils down to this: does the RDF* triple :alice :says << :Paris :population 2229621>> mean

(1) Alice says that Paris has a population of 2229621.

(referential transparency) or

(2) Alice says “Paris has a population of 2229621”.

(referential opacity)?

From (1), it would be acceptable to infer

(3) Alice says that the capital of France has a population of 2229621.

if we know that Paris = capital of France. However, from (2), if would not be acceptable to infer

(4) Alice says “The capital of France has a population of 2229621”

because

Rationale of the current draft

The semantics in the current draft supports referential opacity. This choice was made because referential opacity is required by some use-cases such as

Furthermore, from sentence (2) above, sentence (1) can be reconstructed if needed. The opposite is not true, as sentence (1) does not convey which precise terms were used by Alice.

Strawpoll

Please vote with emojis on this issue:

lisp commented 4 years ago

the meaning should be a matter of interpretation.

pchampin commented 4 years ago

@lisp RDF semantics restricts the range of valid interpretation for a given RDF graph. The question at hand is: how do we want RDF* semantics to restrict the interpretation of embedded triples.

What I find odd is that your comment seems advocate for a less restrictive semantics (IIUC), while your vote is (in my view) for the most restrictive option. I pointed out in the rationale section that from a referentially opaque semantics, we can reconstruct (in the situtations where it is desirable) the referentially transparent one, while the other way around is not possible.

lisp commented 4 years ago

my vote is that the interpretation should not be fixed.

rat10 commented 4 years ago

When the statement identifier is an IRI the reference is per default opaque as an IRI doesn't come with owl:imports semantics, right?

pfps commented 4 years ago

I find this issue misleading. The issue is not opacity vs transparency as RDF can do a form of both in reification. The real issue I see is whether a form of partial transparency is needed, where embedded blank nodes have a special treatment.

pchampin commented 4 years ago

@rat10

When the statement identifier is an IRI the reference is per default opaque

by default yes, unless we choose to extend the semantics for a special kind of "triple identifying IRIs"

as an IRI doesn't come with owl:imports semantics

I am not sure I understand what you mean here. Aren't you confusing referential opacity with the fact that embedded triples may be considered asserted or not? I.e., in my examples above, do we consider that :Paris :population 2229621 is entailed or not by :alice :says <<:Paris :population 2229621>>? This is an orthogonal question.

@pfps

The real issue I see is whether a form of partial transparency is needed, where embedded blank nodes have a special treatment.

I think you are over-interpreting my question. I really simply mean "do we want embedded triples to be referentially opaque". If the consensus is "yes", we will indeed have to decide how blank nodes fit in the picture. But otherwise, this will not even be an issue. So I suggest we focus on the simple question first.

pfps commented 4 years ago

@pchampin I fear that asking the question this way is biasing the results. The options are not transparency vs opacity but transparency versus something that can actually be done in RDF, and most (or all) people don't know that can be done in RDF.

pchampin commented 4 years ago

@pfps by "something that can actually be done in RDF*", you mean "replacing an IRI by a bnode in an embedded triple" and things of that sort?

pfps commented 4 years ago

@pchampin No, what can be done by minor modifications of RDF (as in the current RDF* semantics) as opposed to making a major extension (as in OWL or N3). The latter seems to be out of scope.

VladimirAlexiev commented 3 years ago

Will owl:sameAs semantics impact this issue?

pchampin commented 3 years ago

@VladimirAlexiev owl:sameAs is the most obvious way to assert that two terms refer to the same thing, but co-referring terms can occur without it, even without relying on OWL (e.g. "2"^^xsd:integer and "02"^^xsd:integer).

So, in my view, this issue can be discussed independently of the semantics of owl:sameAs.

VladimirAlexiev commented 3 years ago

I'm no logician but sameAs semantics says two URIs identify one and the same resource, therefore the two should be "smushed" and all their attributes and relations should be put together (so if you query for either URI, you should get all of them).

Some repos don't implement this. Eg http://dbpedia.org/sparql doesn't: Try this, then uncomment the second line and you'll see no results.

select * {
  dbr:Paris dbo:populationTotal ?pop1; owl:sameAs ?same.
  # ?same dbo:populationTotal ?pop2
}

Other repos do.

The annotated triple in :alice :says <<:Paris :population 1000000>> is not asserted, so you may argue that :population is not an attribute of :Paris. But if the repo smushes sameAs URIs, it may be hard to disentangle annotated triples. I changed my vote.

pchampin commented 3 years ago

@VladimirAlexiev interestingly, GraphDB's optimization strategy for owl:sameAs does not seem to hamper referential opacity. :wink:. I just imported the following triples in an "OWL RL (optimized)" repository:

:lois :believes << :superman :can :fly >>.
:superman owl:sameAs :clark.

then ran the following query:

ASK { :lois :believes << :clark :can :fly >> }

and got the answer "no".

rat10 commented 3 years ago

@pchampin

@rat10

When the statement identifier is an IRI the reference is per default opaque

by default yes, unless we choose to extend the semantics for a special kind of "triple identifying IRIs"

as an IRI doesn't come with owl:imports semantics

I am not sure I understand what you mean here. Aren't you confusing referential opacity with the fact that embedded triples may be considered asserted or not? I.e., in my examples above, do we consider that :Paris :population 2229621 is entailed or not by :alice :says <<:Paris :population 2229621>>? This is an orthogonal question.

My thinking is the following: an IRI is referentially opaque. Let's take for example some IRI that refers to some RDF/XML document that contains some RDF triples. Using that IRI as a term in an RDF statement doesn't import those triples into the graph. That would require some owl:imports instruction. Likewise an IRI denoting an RDF statement doesn't import that statement into the graph and consequently no entailments can be drawn from it . An IRI that not only denotes but also encodes a statement is also opaque (as all IRIs are, by definition). It's just a shortcut to the standard RDF reification quadlet - but I disgress. Yes, this happens to be SA mode, but I'm not sure why you consider this aspect orthogonal. Doesn't it pretty well go to the core of the problem at hand?

pchampin commented 3 years ago

@rat10 we are talking about very different kind of "references". The discussion here is not about IRIs as addresses (who reference a content by technically pointing at it), but as names (who reference an entity by conventionally denoting/naming it). In that second sense, IRIs in RDF are referentially transparent, because any asserted triple containing, e.g., <http://champin.net/#pa>, is understood to state something about myself, and not about the IRI itself. Hence:

<http://champin.net/#pa> a s:Person.

is true, because I am a person, but

<http://champin.net/#pa> :lengthInChars 22.

is false, because I am not 22 characters long.

"http://champin.net/#pa"^^xsd:anyURI :lengthInChars 22.

would be true, OTH, if literals were allowed in subject positions.

For RDF* triples, the question is therefore to decide if <<:s :p :o>> represents either

I hope this clarifies things.

pchampin commented 3 years ago

This was discussed during today's call: https://w3c.github.io/rdf-star/Minutes/2020-11-13.html#item03

pchampin commented 3 years ago

This was discussed during today's call: https://w3c.github.io/rdf-star/Minutes/2020-11-20.html#item01

lisp commented 3 years ago

@lisp RDF semantics restricts the range of valid interpretation for a given RDF graph. The question at hand is: how do we want RDF* semantics to restrict the interpretation of embedded triples.

this means that the interpretation depends on the rules which govern the placement of a statement in graphs and the combination of graphs to become the target of query.

TallTed commented 3 years ago

[@VladimirAlexiev]

I'm no logician but sameAs semantics says two URIs identify one and the same resource, therefore the two should be "smushed" and all their attributes and relations should be put together (so if you query for either URI, you should get all of them).

Some repos don't implement this. Eg http://dbpedia.org/sparql doesn't:

The default for Virtuoso (which hosts DBpedia) is for all inference, including that based on owl:sameAs relations, to be "off". There are switches to flip to enable inference based on owl:sameAs and various other relations, from OWL and other ontologies, including custom rules.

We built Virtuoso this way for a number of reasons, not least being owl:sameAs pollution by RDF authors who did not understand the semantics of owl:sameAs, about coreference and otherwise.

I don't see a way to vote in this straw poll that says "we need referential opacity to be a switchable option, which switch is optimally available to both authors and interpreters, so authors can explicitly say 'this is opaque, i.e., embedded is not asserted' or 'this is not opaque, i.e., embedded triples in this data are also asserted' or 'this is a mix of opaque and non-opaque, i.e., some embedded triples in this data are also asserted and some are not asserted'; and interpreters can say 'treat all embedded triples in this data as asserted' or 'treat all embedded triples in this data as unasserted' or 'treat embedded triples as asserted or as unasserted according to rule x' (which might now be the modified Turtle* syntax we've been discussing, which uses << >> and {| |} markup to differentiate between embedded+asserted and embedded+unasserted/quoted)."

That's messy. Like most reality...

pchampin commented 3 years ago

This email (and the following thread) is relevant to this discussion: https://www.w3.org/mid/0AFA8DEC-68EF-41E2-942D-45E927FF006F@rat.io

pchampin commented 3 years ago

Updated strawpoll

Please vote with emojis on this comment:

pfps commented 3 years ago

There should be more background for this straw poll easily available.

The current semantics has the feature that:

:john :height 180 {| :on :today |}

does not entail

:john :height 0180 {| :on :today |}
afs commented 3 years ago

No account for supporting provenance use cases has been given when using referential "transparency":

https://www.w3.org/mid/456433df-e332-0d46-262a-943de64adee2@apache.org

pchampin commented 3 years ago

This was discussed during our latest call https://w3c.github.io/rdf-star/Minutes/2021-05-14.html#t03

TallTed commented 3 years ago

Updated strawpoll

Does not provide enough options! I have strong opinions that are not supported by this poll -- and with which I think most people would arrive at agreement, if they considered only their own use cases long enough, and more quickly if they were willing to consider other people's use cases.

As I said earlier, the world is messy, and there are messy requirements for all of RDF and RDF-star and RDF-next and RDF-eleventeen!

Sometimes I need referential opacity to be global for all embedded triples in a given graph or dataset, at load time or at query time or at dump time.

Sometimes I need referential transparency to be global for embedded triples in a given graph or dataset, at load time or at query time or at dump time.

Most times I will need some mix of opacity and transparency, no matter the overall situation, for sub-scenarios within it.

rat10 commented 3 years ago

@TallTed

Most times I will need some mix of opacity and transparency, no matter the overall situation, for sub-scenarios within it.

See this comment on why that scenario is rather difficult to implement if not outright messy in itself. (I just posted that comment, so you're excused for not having read it already ;-).

TallTed commented 3 years ago

@rat10 -- That comment just highlights a terribly constructed bit of sample data, which is inherently nonsensical, with both referential opacity and referential transparency.

Construct some data which is not nonsense in with at least one interpretation (opaque or transparent), and it may help show why the other interpretation is or is not needed -- for your usage. On the other hand, it may well show why both interpretations are needed at different times, in your usage or someone else's.

rat10 commented 3 years ago

@TallTed The whole RDF-star effort was based on very few examples and when they didn't work any longer as the semantics changed they were called "regrettably misguided" and new examples were brought in. It is relatively easy to produce examples that proof your case. It is much harder to proof how your semantics can deal with arbitrary and expectable use cases. My example is an absolutely ordinary, middle of the road mix of this and that. And it works perfectly fine in my referentially transparent, occurrence-annotating world view. Take a look at a neo4j tutorial. There's a wild mix of :name, :born, :title, :released, :start_date, :located_in, :employee_id, :has_ceo. Show me how you disambiguate referentially opaque from transparent properties. Show me what's wrong with that modelling.

TallTed commented 3 years ago

I choose to ignore nonsense data, when it's clearly nonsense. Arbitrary and expectable use cases where the user inputs nonsense should have a way to be discounted as nonsense.

If you want me to understand (and buy into) an argument you're making, it behooves you to make that argument and all its sample data be internally coherent, sensical, etc.; else, I think it's entirely fair for me to say, "this makes no sense," and move on.

I'm not going to go digest the Neo4j tutorial and try to make it fit RDF-star, as that tutorial is not about RDF-star at all.

rat10 commented 3 years ago

I choose to ignore nonsense data, when it's clearly nonsense. Arbitrary and expectable use cases where the user inputs nonsense should have a way to be discounted as nonsense.

If you want me to understand (and buy into) an argument you're making, it behooves you to make that argument and all its sample data be internally coherent, sensical, etc.; else, I think it's entirely fair for me to say, "this makes no sense," and move on.

I hope my updated example here helps.

I'm not going to go digest the Neo4j tutorial and try to make it fit RDF-star, as that tutorial is not about RDF-star at all.

The Neo4j tutorial is very relevant to learn about the needs and customs of property graph users. Support for RDF-star is to no small degree reliant on its claimed ability to support property graphs in RDF so it'd better support them, and well.