w3c / EasierRDF

Making RDF easy enough for most developers
267 stars 13 forks source link

Standardized n-ary relations (and property graphs) #20

Open dbooth-boston opened 5 years ago

dbooth-boston commented 5 years ago

Since RDF natively supports only binary relations, relations between more than two entities must be encoded using groups of triples. A W3C Working Group Note[9] describes some common patterns, but no standard has been defined for them. As a result, tools cannot reliably recognize and act on these groups of triples as the atomic units that they are intended to represent.

This deficiency has greater significance than it may appear, because it is subtly related to the blank node problem: a major use of blank nodes is to encode n-ary relations. In other words, n-ary relations are a major contributor to the blank node problem.

Furthermore, standardized n-ary relations could also enable direct support for property graphs[10], which have emerged as a popular and convenient way to represent graph data, led by Neo4J.[11] Property graphs add the ability to attach attributes to relationships, which can be viewed as a special case of n-ary relations.

""We, in the ISO 15926 community, would like the concept of N-ary relations to be standardized in RDF, as well as workable Lists. ISO 15926-7 templates are based on "Defining N-ary Relations on the Semantic Web", a W3C Working Group Note dated 12 April 2006, by Natasha Noy and Alan Rector. It would be helpful in case this would become a part of the RDF spec". https://lists.w3.org/Archives/Public/semantic-web/2018Nov/att-0132/00-part

IDEA: Nested triples

Olaf Hartig and Bryan Thompson have proposed "nested triples" conventions for adding property graph support to RDF.[12] "The idea . . . is to extend RDF with the possibility to have triples as the subject or the object of other triples (i.e., nested triples)". https://lists.w3.org/Archives/Public/semantic-web/2018Apr/0030.html

IDEA: Auto-generate a predictable IRI from the object's primary key

"In database practice, the problem is usually solved by adding a primary id column, even if those ids are arbitrary and only unique within a database or even just a table." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0037.html

"If we have a standard way to generate URIs for [implicit blank nodes], based on a natural key (or composite key) that is typically formed from the constituents of the n-ary relation -- the components of [an] address, for example -- then all [parties] could automatically use the same URI for them. Tools could do this automatically whenever the user writes an n-ary relation" https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0270.html

"a blank node that is used to connect the attribute of an n-ary relation could be replaced by a Skolem IRI that is constructed (recursively) from the entities that it relates. Also, conventions could require each n-ary relation to specify a (possibly composite) key." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0047.html

"[To avoid duplicate triples] when n-ary relations are encoded in RDF . . . tools must be aware of a key that uniquely identifies that n-ary relation. And in practice, n-ary relations usually do have a key -- or composite key. The key could be used in automatically assigning a predictable identifier. This would make it trivial for tools to eliminate duplicate triples." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0076.html

"keep the benefits of abbreviated N3 notation while at the same time doing away with blank nodes . . . by automatically introducing well-known IRIs instead" https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0089.html

'maybe there could be a reserved protocol scheme for [IRIs that take the place of blank nodes]. Maybe like "rdf-blank:asdn-2354-8756".' https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0165.html

'[Define] an IRI scheme specific to blank nodes: "blank://example.com/ABEC-2BD-34AEABC" or some such.' https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0037.html

"To summarize, if conventions for n-ary relations allow the user to conveniently indicate which properties constitute a (composite) key -- perhaps defaulting to all properties -- then in theory tools could use that information to automatically collapse duplicate nodes, whether they use blank nodes or URIs. But if this is done with URIs that are predictably generated from those keys -- instead of blank nodes -- then we get the advantage that existing tools already will collapse them, whereas they wouldn't if blank nodes are used." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0310.html

IDEA: Extend RDF from binary to n-ary

"Extending from binary to n-ary means that we structurally subsume SQL tables, the bread and butter for the middle-33. We can imagine syntax of labeled rather than positional arguments. Also a convention for property graphs (e.g. final argument is a list of property-value pairs; remember expressions are first class). By subsuming rather than having awkward difficult impedance-mismatch ridden bridging/mapping solutions we can actually put semweb concepts like first-class IRIs and inference into the hands of more developers, allowing them to use toolchains that are familiar to them." https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0045.html

laurentlefort commented 5 years ago

This is another area of interest for me:

I have used Dr. Athanassios I. Hatzis 's analysis: Do you Understand Many-to-Many Relationships ? Associative entities are represented differently in various data models http://healis.eu/r3dm_project/do_you_understand_many_to_many_relationships/

and Szymon Klarman's Modelling Data with Hypergraphs - A closer look at GRAKN.AI’s hypergraph data model https://blog.grakn.ai/modelling-data-with-hypergraphs-edff1e12edf0 - esp. the example built around the "ternary divorce filing relationship involving three role-players in the roles of certified marriage, petitioner and respondent")

and my limited knowledge of Stardog's work on Path queries https://www.stardog.com/docs/#_path_queries (and feedback from colleagues about specific queries it can help with)

as signals that looking for "one solution fits all requirements" in this space might be beyond reach.

The corollary question is: are the ease of use / comfort zones for each of these approaches overlapping with each other or not? (or are there addressing different niches of requirements). If yes, then the challenge is more to make the solutions supporting them more compatible with each other (interoperable?).

There are similar challenges for users wishing the best of the RDF and OWL2 world simultaneously -

BTW, has anyone developed a Jena-based utility tool which (properly) remove the bits which OWL API don't like?

I have also noticed CKG2018 ( http://wiki.knoesis.org/index.php/CKG2018 ) paper and slides by Peter Patel-Schneider: Contextualization via Qualifiers (I like his "don't put the burden of contextualization on system user" message).

Also worth sharing here: Olaf Hartig's slides on RDF and SPARQL An Alternative Approach to Statement-Level Metadata in RDF http://olafhartig.de/slides/RDFStarInvitedTalkWSP2018.pdf (also done in April 2018 but not shared via the W3C mailing list).