w3c / EasierRDF

Making RDF easy enough for most developers
262 stars 13 forks source link

Property Graphs #45

Open draggett opened 5 years ago

draggett commented 5 years ago

This deserves an issue to itself given the growing popularity of property graph databases and the opportunity for using RDF as an interchange framework between different databases. See also #20 Standardized n-ary relations (and property graphs) and #22 Language-tagged strings.

Property Graphs are a kind of graphs consisting of nodes and links between them where nodes and links may be associated with a set of property-value pairs, where the values may themselves be sets of property-values and so forth recursively. The link predicate or label can itself be treated as a kind of property.

It is possible to represent property graphs with reification, but that adds considerable complexity. We can easily annotate a node using a link to another node. However, we also need a way to link from a link or to a link. One approach is for each link to expose an identifier enabling the link to be treated as equivalent to an RDF blank node. Such identifiers are okay for links within the same graph and can be implicit in serialisations like Turtle* where a pair of curly braces implies a new identifier.

What if you want to make a link something that can be referenced stably from other graphs? That suggests the need for a means to associate the link with a named anchor that is unique within the graph. What if the link itself starts in one graph and ends in another - where would you situate the anchor for that link? The answer would seem to be the graph that the link was defined in.

Another challenge concerns the case where a node stands for another graph, e.g. the node has a URI that can be dereferenced to obtain the graph the node stands for. This allows you to make statements about a graph as a whole rather than one of its nodes or links. It would be desirable to quickly determine that a node indeed stands for a graph so as to avoid having to find this out by trying to deference the node.

Yet another challenge is where you want to distinguish properties from other kinds of links. This would allow for visualisations where you can hide and reveal properties with a tabular presentation of property-value sets. See #37 Lack of RDF Visualisation Software.

It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages. Such short names could be scoped to the node that acts as the subject for a link, or the root for a n-ary chunk.

amirouche commented 3 years ago

I was going to reply something similar. I came to the realization that object mappers such as ORM / ODM / OGM are a pipe dream before diving into Scheme and RDF. It may have some use to describe a schema with a set of Java class with annotations in cases where there is no other way to do it.

there are many obstacles: technical, epistemic and cultural.

I have done that journey when I was younger, I started with a Java, UML, SQL, I still do my daily chore with an ORM. The physical barrier is bigger and stronger obstacle that those you mentioned (see also the software crisis).

Check out Apache Jena, I do not think there will be a better answer elsewhere.

mhedenus commented 3 years ago

Thank you all for having this conversation. One outcome for me was that I have to present my position more precisely. I have written an essay that I want to bring to your attention: https://github.com/mhedenus/on_graphs_and_models Any comment is appreciated.

dbooth-boston commented 3 years ago

I have written an essay that I want to bring to your attention: https://github.com/mhedenus/on_graphs_and_models

I would find it helpful if you could precisely itemize the differences between what you are calling a "property" versus a "relation", so that I can understand the distinction you are trying to make. What is true of a "property" that is not true of a "relation", and vice versa? What can I do with a "property" that I cannot do with a "relation", and vice versa? What characteristics do "properties" have that "relations" do not have, and vice versa? How are "properties" written or depicted, in contrast with "relations", and vice versa? If you could provide a concise list of the differences, it would help.

HughGlaser commented 3 years ago

I've been following this discussion, and it hasn't been clear to me what you are saying. Your note seems to help.

Here is my take/echo to you.

You are casting this as a difference in the ability to model systems. However, in fact, you use both RDF & PG to model things, without showing any difference in expressibility, which is where my difficulty is/was, because you seemed to be saying that there are things in one that you can't model in the other. What you are describing, to me, is more a difference in discrimination. PGs have two ways of modelling the situations you discuss, whereas RDF has a single way (in your characterisation) And it may be that having that difference is useful, although as you seem to say they are interchangeable, and the modeller has a choice, any difference must simply be down to the culture of what people do when using PGs. (Many would argue that having two ways of modelling the same thing is actually a Really Bad Thing.) You seem to be seeing a very deep difference, whereas it feels like you are describing a pretty shallow difference, that is almost at a representational or even syntactic level.

You conclude that you can go from PG to RDF without any challenges (2 goes to 1), but going from RDF to PG means that you need to make a decision about which of the two choices you make (1 goes to which of 2).

I think I can see why you see it as a modelling problem, but that seems a weird way of casting it to me. The world you want can effectively be modelled by either system. It is just that you can't move between them so easily. Well, in fact I am guessing you could, if you simply decided not to use the property stuff of the PG!

I am pleased to see you start of with "Naming Things Unambiguously" as an issue. This seems to me probably the biggest challenge about interchanging between RDF & PGs.

Cheers

On 7 May 2021, at 10:26, Michael Hedenus @.***> wrote:

Thank you all for having this conversation. One outcome for me was that I have to present my position more precisely. I have written an essay that I want to bring to your attention: https://github.com/mhedenus/on_graphs_and_models Any comment is appreciated.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Hugh Glaser CEO Seme4 Limited International House Southampton International Business Park Southampton Hampshire SO18 2RZ Mobile: +44 7595 334155

@.*** www.seme4.com

namedgraph commented 3 years ago

@mhedenus you're still thinking in ER terms and as long as you do that you will be seeing some mismatch in RDF. RDF is a directed labelled graph at the very basic level. Here's an example of a directed graph: image Do you see any difference between "properties" and "relations"? No, because there is none.

But in practical terms, what prevents you from defining

pg:Relation rdfs:subClassOf owl:ObjectProperty .

after which your example becomes

ex:name       a rdf:Property . # or rather owl:DatatypeProperty
ex:employedAs a rdf:Property . # or rather owl:ObjectProperty
ex:likes      a pg:Relation .

and you have distinction between "properties" and "relations" and it still makes sense semantically (unless I messed up the subclassing).

mhedenus commented 3 years ago

Thank you all for reading my note and making this excellent remarks!

@HughGlaser Your summarised my thoughts very nicely. Maybe the term "expressivness" is misleading. It did not mean that either graph style is more powerful. As you said they both can model the world but they do it differently.

I will try to respond to your objections. They all come down to the questions: is the property/relation question is a real deep issue or is it a superflous pettifoggery of a shallow difference? If there are differences, can they be listed (are the sufficient arguments for beeig a property or relation?)

The answer may be a bit surprising. I used the term "graph style" for a reason. There is the concept of Thought Style. It is an important concept in the field of history of sciences, it is the basis of the concept Paradigm. To simplify it: you have a style of thinking that is shaped by your context.

I make a hypothesis: there are two thought styles here, the context of RDF/Linked Data and the context of Property Graph/Applied Mathematics. A member of the first groups says: "What are you talking about? There is no difference!" A member of the other group might say: "Why don't you see it?"

I do not want to convince you to adopt anything, but I do want you to accept that the difference property/relation is made by others. You cannot deny that ER or (UML) class models exist and they distinguish between membership and association. I think that accepting this fact is the necessary condition for bridging PG and RDF. This is the topic of this thread, is it not ? If you 'simply decide not to use the property stuff of the PG' (to cite @HughGlaser) than you are not supporting PG! I also think that RDF must do something here because it provides the more general/abstract graph style.

So, I rephrase the question of @HughGlaser and @dbooth-boston:

IF you accept the difference between relation/property THEN how do you distinguish them?

Well, this is a very good question and the discussion can be extremly deep if not confusing (for example: intrinsic versus extrinsic properties). This is not the right place for such a discussion. Putting all philosophical questions aside I think finally it is choice that is made by the guys who create the model. (Whether or not it is a bad thing to have this choice is again another question!)

I would list following general rules:

@namedgraph : I am sorry, but RDF is not what you are showing. RDF is a labeled directed graph plus three nodes types: IRI, Blank and Literal. There is a restriction that limits Literals to be leave nodes. This additional feature changes the graph completly. If you define a owl:DatatypeProperty it says that the object-node of an RDF statement shall be a Literal (please correct me if I am wrong here). I believe, we agree that Literals are properties in the PG sense (I said this several times). But what about IRIs and Blanks? Can they be interpreted as properties in the PG sense? Can a whole sub-graph be a property value in the PG sense? So owl:DatatypeProperty and owl:ObjectProperty do not help here.

namedgraph commented 3 years ago

Can you use named graphs as "sub-graphs"?

And have you looked at other RDF -> PG mapping approaches? For example: https://github.com/Rothamsted/rdf2pg#mapping-rdf-to-cypherneo4j-entities-general-concepts

mhedenus commented 3 years ago

Named graphs as property value? I didn't think about it, but this is a very interesting idea. Why not?

The link you provided is good example for what I mean: you must specify a mapping. If you are on the PG side as the active consumer, the problem is only a technical one, because you know how to map. But there is no general solution without any additional information.

mhedenus commented 3 years ago

I say the best way is to create a new vocabulary. Here is a sketch. I use prefixes to make clear what I mean:

Example:

ex:Alice a ex:Person ;
   ex:name "Alice";
   ex:likes ex:Bob ;
   ex:employedAs ex:Scientist .

ex:Bob a ex:Person .

# new Property-Graph Ontology

ex:Alice a pg:Entity .
#OR
ex:Person a pg:EntityType . 

ex:name a pg:PropertyType . # can be inferred ?
ex:likes a pg:RelationType.  #  can be inferred ?
ex:employedAs a pg:PropertyType # this cannot be inferred and must be asserted! 

Now imagine an PG visualization application that parses this RDF. There should not be any problem.

I would be happy if the W3C would adopt this initiative and develop a Property Graph Ontology.

pchampin commented 3 years ago

@mhedenus

I say the best way is to create a new vocabulary.

I say an even better way is to reuse one that already exists :-) https://ieeexplore.ieee.org/abstract/document/9115617 (ping @domel)

+1 to what you wrote about 'Thought styles'. Each paradigm has some features "baked-in" (property-relationship distinction for PG, unique identifiers for RDF...) because they were considered essential by the community in which they appeared. Other features can always be added through extra layers (a conventional 'iri' property on each node for PG, a meta-ontology in RDF such as the one you proposed above). In the end, as @HughGlaser points out, the expressiveness is roughly the same, but the trade-offs differ.

NB: another place where the property/relationship distinction can be made is in the visualization layer. Great example at https://vitalis-wiens.github.io/donatello-pipelines/

mhedenus commented 3 years ago

@pchampin

That paper looks very interesting! It seems their focus is on transforming PG to RDF. For me the focus would be on the other direction RDF to PG!