w3c / EasierRDF

Making RDF easy enough for most developers
262 stars 13 forks source link

Property Graphs #45

Open draggett opened 5 years ago

draggett commented 5 years ago

This deserves an issue to itself given the growing popularity of property graph databases and the opportunity for using RDF as an interchange framework between different databases. See also #20 Standardized n-ary relations (and property graphs) and #22 Language-tagged strings.

Property Graphs are a kind of graphs consisting of nodes and links between them where nodes and links may be associated with a set of property-value pairs, where the values may themselves be sets of property-values and so forth recursively. The link predicate or label can itself be treated as a kind of property.

It is possible to represent property graphs with reification, but that adds considerable complexity. We can easily annotate a node using a link to another node. However, we also need a way to link from a link or to a link. One approach is for each link to expose an identifier enabling the link to be treated as equivalent to an RDF blank node. Such identifiers are okay for links within the same graph and can be implicit in serialisations like Turtle* where a pair of curly braces implies a new identifier.

What if you want to make a link something that can be referenced stably from other graphs? That suggests the need for a means to associate the link with a named anchor that is unique within the graph. What if the link itself starts in one graph and ends in another - where would you situate the anchor for that link? The answer would seem to be the graph that the link was defined in.

Another challenge concerns the case where a node stands for another graph, e.g. the node has a URI that can be dereferenced to obtain the graph the node stands for. This allows you to make statements about a graph as a whole rather than one of its nodes or links. It would be desirable to quickly determine that a node indeed stands for a graph so as to avoid having to find this out by trying to deference the node.

Yet another challenge is where you want to distinguish properties from other kinds of links. This would allow for visualisations where you can hide and reveal properties with a tabular presentation of property-value sets. See #37 Lack of RDF Visualisation Software.

It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages. Such short names could be scoped to the node that acts as the subject for a link, or the root for a n-ary chunk.

dbooth-boston commented 5 years ago

I also think support for property graphs is very important. However, my strong hope is that we adopt a mechanism for n-ary relations that subsumes property graphs as a special case, so that we do not need a separate mechanism. So far I have not seen any big barriers to such an approach.

My 2 cents on some of your questions:

It is possible to represent property graphs with reification, but that adds considerable complexity.

Agreed. And I find myself recoiling in horror at the mere mention of reification. In my view, RDF reification should be deprecated, since named graphs are generally much better, though not needed for property graphs.

What if you want to make a link something that can be referenced stably from other graphs?

Then a URI should be used, consistent with existing RDF practice.

What if the link itself starts in one graph and ends in another - where would you situate the anchor for that link?

Although that could be done in existing TriG (for example) I do not think it should be supported in a new higher-level RDF language. I think an RDF molecule that represents an n-ary relation should exist entirely in each graph where it is used, and should be considered malformed if one tries to put part of it in one graph and part in another. The reason is that the user, by creating it as an n-ary relation, intended it to be treated as a single unit. However, there would be nothing wrong with asserting some new triples or a new n-ary relation that makes use of some of the constituents of another n-ary relation.

Another challenge concerns the case where a node stands for another graph, e.g. the node has a URI that can be dereferenced to obtain the graph the node stands for. This allows you to make statements about a graph as a whole rather than one of its nodes or links. It would be desirable to quickly determine that a node indeed stands for a graph so as to avoid having to find this out by trying to deference the node.

My gut feeling is that that should be done by attaching additional metadata triples to the graph URI, such as provenance.

Yet another challenge is where you want to distinguish properties from other kinds of links.

Yes. My assumption is that by coming up with a standard way to define n-ary relations, this ability will fall out as a natural consequence: a particular group of triples will be automatically identifiable as an n-ary relation comprised of those properties.

It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages. Such short names could be scoped to the node that acts as the subject for a link, or the root for a n-ary chunk.

Interesting idea! I wonder how the scope could be known, so that the interpretation would be stable in the face of changing data. It would be bad if x.foo were to select one property against one set of data, but a different property if more data were added. Anyone have thoughts on how this could be done?

draggett commented 5 years ago

It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages. Such short names could be scoped to the node that acts as the subject for a link, or the root for a n-ary chunk.

Interesting idea! I wonder how the scope could be known, so that the interpretation would be stable in the face of changing data. It would be bad if x.foo were to select one property against one set of data, but a different property if more data were added. Anyone have thoughts on how this could be done?

I think that is tied to the cardinality of the property, i.e. whether "foo" is constrained to a singular value or can have multiple values (via multiple links with the same subject and predicate). Following the given path may thus return a set of nodes containing zero, one or multiple nodes. When we look at how to model n-ary chunks, we should also look at associated metadata including cardinality constraints, composite keys and so forth. What metadata would make data and rules easier to use by the vast majority of developers?

Path following is related to regular expressions and RDF shapes, as well as to XPath for XML. I've explored it in some experiments inspired by ATNs, see https://www.w3.org/WoT/demos/shrl/test.html

p.s. I am using the term chunk as it is popular in Cognitive Science and features prominently in cognitive architectures like CMU's ACT-R.

dbooth-boston commented 5 years ago

If someone wrote x.foo as a path, using short names, then I assume that each corresponding long name would be comprised of a namespace plus the short name. How would the system know which namespace to prepend to the short name? For example, if the current namespaces included both http://example/a# and http://example/b#, how would the system know whether foo should be expanded to http://example/a#foo or http://example/b#foo? Or do you envision this working some other way?

draggett commented 5 years ago

I assume that each corresponding long name would be comprised of a namespace plus the short name

No, that isn't the case. This is just a graph of objects where the object properties act as links to other objects, and each object property has a name that is scoped to that object. In RDF terms, the subject node + the property name provides a map to a predicate, and uniquely identifies a set of triples with that subject and predicate.

A restriction on this would be to constrain property names to uniquely identify predicates in this graph. This is tantamount to saying that the property name uniquely identifies the meaning of a property, rather than this being something specific to each object.

That is an overly strong constraint as in the real world, words are often used for different meanings depending on the context. However, there is nothing to prevent implementations from optimising how they handle this internally.

dbooth-boston commented 5 years ago

I would like to pursue the possibility of encoding property graphs in standard RDF. Have others already done this? If so, what RDF patterns were used, and what limitations did they have?

draggett commented 5 years ago

Apart from reification, one approach that has been mentioned is to use a named graph that contains just the triple you want to annotate. This generalises to annotations on multiple triples, but I am unsure how you indicate that a given triple is in multiple named graphs. Another challenge is how you identify a graph when there isn't an explicit name for it, e.g. when using curly braces in Turtle* around the triples you want to annotate, this would imply an implicit blank node for the associated graph.

This makes me think about how to deal with graphs from an implementation perspective. One idea is to express the relationship between a triple and a graph is as a property of the triple, where the property can have multiple values. Another idea is to allow for relationships between graphs, e.g. for one graph to be subsumed as part of another graph. A database could create its internal identifiers, and associate them with external identifiers when those are defined.

I wonder how this is dealt with by existing property graph database solutions?

VladimirAlexiev commented 5 years ago

I am unsure how you indicate that a given triple is in multiple named graphs.

You make several quads having the same <s,p,o>.

amirouche commented 5 years ago

Property Graphs are a kind of graphs consisting of nodes and links between them where nodes and links may be associated with a set of property-value pairs, where the values may themselves be sets of property-values and so forth recursively.

The part in bold is not true. Node and Link (respectively Vertex and Edges) properties are plain old hashmap, JSObject or dict.

The link predicate or label can itself be treated as a kind of property.

Yes.

It is possible to represent property graphs with reification, but that adds considerable complexity.

What reification? I looked up around I still don't understand.

What if the link itself starts in one graph and ends in another - where would you situate the anchor for that link?

That is exactly what I meant about "it is advanced use" in the this comment.

Another challenge concerns the case where a node stands for another graph, e.g. the node has a URI that can be dereferenced to obtain the graph the node stands for.

I think we should come up with a representation of a property graph before trying to generalise to recursive or hierarchical graph or "meta-graph".

True story: as part of a foolish tentative to replace the atomspace, I was thinking about how to implement this kind of things. Basically a single entity called the atom that has outgoing and incoming links and properties as a hashmap. Then came up the idea of "recursive hyper graph". Like you wrote, it is complex to just to imagine a node (or atom in my case) pointing outside its own graph. Like you wrote, having a node represent another graph or sub-graph (because it is hierarchical, it make sens). Again, I think it is the role of the reasoner / rule engine to deal with that kind of complexity. As part of my exploration, I tried to implement but in the end, there really no way to "make it fast" and a priori, you don't know when the "query" will end.

my strong hope is that we adopt a mechanism for n-ary relations that subsumes property graphs as a special case

what is "n-ary relations" please?

It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages.

That is what Gremlink (from Thinkerpop) mostly does. It is written like

graph.vertices.filter(lambda x: x.type == 'actor').outgoing.filter(lambda x: s.genre 'science-fiction')
amirouche commented 5 years ago

I have written a some time ago an article on how to build a graph database on top of EAV. You can find it at https://hyper.dev/blog/diy-graph-database-in-python.html.

EAV is somewhat like a triplestore but you can not have multiple triples with the same subject, predicate. On top that abstraction, I built a document store by grouping by subject. Each document has a private field that allows to distinguish node from edges. Also edges have two other private predicates node-start and node-end.

mhedenus commented 3 years ago

Hello, I'd like to pick up this topic and discuss a specific question: How can you distinguish a property from a relation ?

In RDF that is not possible, because there is no such distinction. Example:

<Alice> <knows> <Bob> .

and

<Alice> <mbox> <mailto:alice@example.com> .

are completely equal in the sense that they are simple statements. But the meaning is very different because Alice and Bob are persons, they are entities i.e. they are things (resources) which have distinct existence. The second statement states that Alice has a property, i.e. contact address. Although the mailbox is an URI it has the flavor of a value, like

<Alice> <name> "Alice".

Of course, you can say that a mailbox is also an entity but wether or not something is an entity is a decision made by the domain model. I think this is the crucial question when you want to bring Property Graph and RDF together!

I also want point out, that Property Graph is a technical way to do ER modelling. Nodes become entities, edges become relations a key-value pairs become attributes==properties.

My vision is to create a unified graph model the embraces ER-modelling and RDF at once.

namedgraph commented 3 years ago

In RDF you distinguish between URI resources as objects or datatype resources as objects. In this case <mailto:alice@example.com> is a URI resource which can have its own triples. "alice@example.com" would be a literal.

Absolute terms like "not possible" do not help IMO because while it may seem so to you coming from a different background, there are very good reasons why RDF is like it is, and formal theory behind them. RDF was designed for data interchange.

Are you familiar with RDF-star?

mhedenus commented 3 years ago

Sure, I know all that, and I am familiar with RDF*. I completely understand why RDF is designed like it is. The question is not so technical, it is more a theoretical.

Of course you can write

<Alice> <mbox> "alice@example.com"^^xsd:anyURI

We can agree that a literal is property value. But it can be more difficult than that:

<Product1> <price> [ <value> 300 ; <currency> <euro> ]

What about <euro>? Can it be an entity? If so, then <currency> would be a relation, because a relation is between entities not properties and entities. But is the blank also an entity? Then the whole price would be an entity.

namedgraph commented 3 years ago

Entity is not an RDF term. If we're talking ontological modeling, a related term would be class.

Sure you can call the price entity, and the euro as well. Why is that a problem?

dbooth-boston commented 3 years ago

After re-reading some of this thread, I notice that I missed a couple of questions from @amirouche a couple years ago. Sorry!

What reification? I looked up around I still don't understand.

See this brief explanation and this answer on stackoverflow.

what is "n-ary relations" please?

See Defining N-ary Relations on the Semantic Web.

And addressing newer comments from @mhedenus :

How can you distinguish a property from a relation ?

Can you please first explain what distinction you are trying to make between a "property" and a "relation"? AFAIK we do not have widely accepted standard definitions of those terms that clearly distinguish between them. If you could explain what distinction you are trying to make, it would be helpful.

Also, please explain what you mean by "entity", and why you think some things should be considered entities and some should not. When you wrote "they are entities i.e. they are things (resources) which have distinct existence" it sounds like you are using the term "entity" to mean what RDF calls a "resource". But then when you suggest that some things should be considered entities and some should not, that sounds different than the RDF notion of "resource", so I am confused. Can you explain what you mean by "entity" and how it is different from what RDF calls a "resource"?

mhedenus commented 3 years ago

Maybe I should clairfy what it is all about. I am a advocate of RDF since I learned about it 20 years ago (I do programming since 1988 and Java development since 2000). I worked very hard to establish RDF as technology in my company in the automotive industry. Currently, we use RDF primarily for data integration.

But can you use RDF for modelling, e.g. using RDFS or OWL ? I think not.

The reality is: modelling is hard especially because domain experts are normally not software developers. When you start talking about URIs, resources and stuff they only understand blah blah blah. I believe the main reason why the adoption of RDF is so scarce (YES IT IS! There are still too many peaple out there who have never heared about it) is because people don't get it! RDF is extremely academic!

What people understand (even mechanical engineers) is ER modelling. They understand that there are things (entities or objects) which have properties (or attributes) and they have relations to other things.

Let's make a (over-)simplification here: there are two main graph modelling worlds:

  1. Property Graph == ER Modelling == Domain Modelling == {entities, relations, properties)
  2. RDF == {resources, predicates, literals}

Can these worlds be brought together? Yes. We have developed a graph model that is a Property Graph compatible with RDF. It is working and I believe that it can give benefits to the RDF world.

mhedenus commented 3 years ago

RDF is talking about resources. Everthing that can be identified with an URI is a resource. An email-address is a resource. If you use an email-address to denote a person you write (like proposed by FOAF)

<mailto:alice@example.org> a <Person> ; <name> "Alice"

So far so good. Now let's express the fact that "Alice has a email address":

<mailto:alice@example.org> <mbox> <mailto:alice@example.org>

That is legal in RDF and it makes complete sense in RDF. But these statements have different meanings which are only obvious to human readers. In the first statement the mailto URI is an identifier for something we call Alice, in the second statement the same URI is a value that belongs to a property owned by Alice.

Do you agree?

dbooth-boston commented 3 years ago

You have used the same URI,<mailto:alice@example.org>, to denote both a person and a mailbox. That is a URI collision. According to the Web Architecture, you should have used different URIs, to avoid the problem that you're raising. RDF itself does not stop you from doing that, but that doesn't mean it's a good practice either.

But what does this have to do with implementing property graphs in RDF? I don't understand where you're going with this example.

mhedenus commented 3 years ago

Yes, this URI collision is not nice, and should be avoided. This example should demonstrate what I think to be the stumble when you try to implement property graph in RDF. When you try to map RDF to property graph you have to know wether the statement's object is another node (and therefore the predicate a relation) or a property value (and therefore the predicate a property type or key).

To say all URIs are mapped to nodes in the property graph and ONLY statements with literals are properties would be an artifical restriction.

To solve this some addtional information is required that tells you which predicates are considered to be relations and which predicates are considered to be properties.

namedgraph commented 3 years ago

I still don't get why developer's unfamiliarity with a technology is being framed as defficiency of the technology, and not the developer. This seems to be a constant theme for EasierRDF.

Many more developers know Javascript than C++. Does that make C++ academic, and by that somehow defficient? Should we have EasierC++?

If developers are familiar with ER or UML or whatever, then provide mappings/converters to OWL/RDF(S). But don't use that as an opportunity to knock RDF.

dbooth-boston commented 3 years ago

@namedgraph I think I disagree with you fairly fundamentally about this. I think lack of uptake can be an important indicator that a technology is too hard to use. It certainly is not an absolute determinant though.

If you look at market shares, RDF databases are getting clobbered by property graph databases. You can claim that RDF does more than what Property Graphs can do -- and I agree -- but it isn't a huge difference, and apparently it isn't a difference that matters to many common use cases.

I want to improve RDF, not knock it. And that means being honest about its strengths and weaknesses. IMO its biggest weakness is its difficulty of use. If we can make it as easy to use as Property Graphs -- at least for use cases that do not need functionality beyond Property Graphs -- then I think that would be very beneficial for RDF. But as I said before "my strong hope is that we adopt a mechanism for n-ary relations that subsumes property graphs as a special case, so that we do not need a separate mechanism".

namedgraph commented 3 years ago

@dbooth-boston we've been over this...

I'd like you to try the C++ analogy though. StackOverflow is full of questions "why is C++ so hard?" and yet some of the most critical software is written in it. How is this different from RDF?

dbooth-boston commented 3 years ago

This is a bit off topic, but I'll indulge your C++ analogy and try to answer. I think you are suggesting that, even though RDF is hard, it is still the right tool for the job sometimes, just as C++ is the still right tool for the job sometimes, even though it is hard. I definitely agree that RDF is sometimes the right tool for the job. (I would not have been involved with RDF for so many years if I didn't!)

But here is where I think the analogy breaks down. When C++ is chosen, almost invariably the overriding reason is for performance. I don't believe anybody would choose C++ over Python (for example), if performance were not a key consideration. And the reason C++ is hard is because it is both a low-level C-compatible programming language and a high-level object-oriented programming language. When performance is critical, there is no getting around the need for a low-level language like C. One could of course use C instead of C++, but the higher-level features of C++ allow for more programmer productivity while still giving access to the low-level features of C. In other words, programmers put up with C++'s difficulty because they NEED have the low-level features that it provides.

In contrast, I do not believe that RDF is chosen because developers really NEED the low-level features that it provides. I believe we can produce a higher-level successor to RDF, that retains the power that we need, while making it easier to use.

As a case in point, I do not believe that we really NEED explicit blank nodes in RDF, i.e., blank nodes like _:b42 that cannot be represented by square brackets [] in Turtle. We could solve the same use cases if RDF did not have them, even though we might have to create a few Skolem URIs instead sometimes. Yet that one little feature -- the ability to write an explicit blank node -- places a disproportionate complexity burden on RDF users. Not only does that feature cause endless confusion to new RDF users (because blank node labels are not stable identifiers), but it is precisely the reason why, after over 20 years, we still do not have a standard way to canonicalize RDF!

In short, the low-level features of C++ are essential to its users, but the low-level features of RDF are not essential. They only continue to exist because we have not yet developed a higher-level, easier-to-use successor.

Unless we succeed in making RDF considerably easier to use, I think RDF will eventually get squeezed out of the picture entirely, in favor of other graph approaches that are easier to use, even though those other graph approaches are not quite as powerful.

mhedenus commented 3 years ago

Dear @dbooth-boston and @namedgraph, the discussion is going into the wrong direction. I never intended to knock anybody or to start a fruitless disucssion about what is better. When I say that RDF is academic, this is no bad thing (beeing an academic myself) and I do not want to qualify anything. It's a simple truth that being brilliant is not all you need to be successful.

This is a thread about Property Graphs and RDF? Well then let's go back on track what I consider a valid conceptual question. The main difference between RDF and Property Graph is not that it is not easy in RDF to assign predicates to a predicate. This can be done with reification. The main difference lies in the core graph model itself.

I painted a quick picture. Let's assume you have a domain model consisting of two types of things A and B. You can create a UML class diagram or a ER diagram. Both are modelling the same situation.

Now consider an instance graph that contains the data modelled by the UML diagrams. The property graph is very straight forward. The RDF graph is more complex due to the atomic nature of its nodes. The type and member1 of the things of the model become nodes in the RDF graph.

Now imagine an implementation that has to map the model elements (ER, class) and instance-graphs (PG, RDF). The mapping to RDF is more complicated: you must distingiush properties or attributes like the type and member1 from the relation or association relatesTo. They all become predicates in RDF. So mapping back from RDF to the model becomes somewhat ambiguous.

Now the question: if you are looking at the RDF graph, how can it be possible to see that member1 is meant to be a property/member/attribute and relatesTo is meant to be an association/relation ?

Please don't answer: you don't do that, you use RDFS or OWL! This is a conceptual question, not a technical.

One part of the answer surely is: If the subject of the RDF statement is an URI and the object of the RDF statement is a literal then the predicate is meant to be a property/attribute.

But is this sufficient? What about rdf:type? Are there special cases? Is some special information required? If so, how it is provided? Some special model, ontology... ?

image

mhedenus commented 3 years ago

To put it more mathematically: The model-graphs and instance-property-graph are homomorphic, the RDF graph is not.

draggett commented 3 years ago

For the record, the W3C Cognitive AI Community Group is incubating a higher level approach to knowledge graphs that is easier to work with than JSON-LD whilst retaining mapping to RDF triples. It focuses on "chunks" as a collection of properties whose values are literals, references to other chunks or a sequence thereof. Your example becomes:

A c1 {member1 value1}
B c2 {}
c1 relatesTo c2

Chunks map to one or more RDF triples with a shared subject node. Chunk types and properties can be easily mapped to RDF URIs in a similar manner to JSON-LD. Knowledge engineers need to decide when to model something as a property or an explicit link. However, it is easy to promote a property to a link when needed.

Chunks has a broader scope than either RDF or Property Graphs, as it seeks to support general purpose human-like AI, inspired by progress in the cognitive sciences and over 500 million years of neural evolution. Chunks are associated with sub-symbolic parameters that model human memory in respect to prior knowledge and past experience. The chunks rule language models the cortico-basal ganglia circuit in the brain. Sequential rule execution corresponds to the sequential nature of consciousness, and draws upon decades of work by John Anderson at CMU. By contrast, chunk databases support parallel execution of graph algorithms.

You would be right in thinking this work is still in its infancy, but general purpose AI will be hugely disruptive, and both RDF and Property Graphs will be effected by the rise of machine learning. To quote William Gibson "The future is already here – it's just not evenly distributed"

mhedenus commented 3 years ago

That is very exciting, maybe an AI can recognize it.

Knowledge engineers need to decide when to model something as a property or an explicit link.

That's getting to my point. How to do you see in the RDF graph wether it is a property or link ?

dbooth-boston commented 3 years ago

@mhedenus and @draggett I agree with the general direction that you describe, which I see as the ability to manipulate small subgraphs of RDF as though they are single, indivisible objects (or "chunks" as @draggett calls them) -- making RDF a higher level language. I think three fundamental capabilities are needed to achieve this:

  1. Manipulate those chunks as indivisible objects;
  2. Compose a chunk from a subgraph -- i.e., "bless" it, to enable that subgraph to be manipulated as an indivisible chunk; and
  3. Decompose a chunk into its subgraph, i.e., access its parts.

In programming languages we do this routinely with data structures, but we don't (yet) have this capability in RDF.

amirouche commented 3 years ago

That. is. a. giant. insightful thread.

Property graphs have the advantage that they require no particular knowledge to be understood or used, it is easy to draw, easy to explain. In particular, the API is easy without much diversity or even room for creativity. The only roadblock in my opinion is GQL, inspired from Cypher, but that might be particular bias of mine because I prefer high-level and specific domain languages. At intermediate levels, so far, I prefer procedural approaches.

Something that may seem natural to the seasoned software practitioner, might not be for the newbie. I think about trees as an example of deep topic in software, even if it has natural representation. The opposite also exists: tables. Tables are not represented in the natural environment, but given the massive education, and tooling, it succeed as concept and new common tool.

It was already written in this thread using other words: a property graph as a programming interface is less powerful (weaker) than RDF. Another way to explain the same idea, all things considered: RDF can implement property graphs, what RDF allows to describe is a superset of what a property graph allows to describe. To be able to tell whether one is better than the other as a solution for building a particular product is another problem (must dive into performance, culture, existing knowledge features).

Indeed the property graph model is easy to the mind, even more so thanks to UML, ER diagrams, etc. That does not necessarily mean I will use a property graph database for the implementation. A graphdb is a tool for the non-practitioners, and help software practitioners to deliver more quickly.

What matters the most is the conceptual tools that can be invented, the new thought that can be had. In this regard, property graph did not help me: it is a direct mapping of existing, almost natural knowledge. Unlike RDF, that provides, in my opinion and experience, tools and grounds to create and think about new ideas.

  1. Manipulate those chunks as indivisible objects;
  2. Compose a chunk from a subgraph -- i.e., "bless" it, to enable that subgraph to be manipulated as an indivisible chunk; and
  3. Decompose a chunk into its subgraph, i.e., access its parts.

That is exactly the description of a project I built inspired from the atomspace. In my implementation atoms had key-value pairs (properties), zero or more incoming and outgoing links (links have no label or properties). subgraph were reified with an atom and links toward the atoms composing the subgraph. Like I commented in recursive / hierarchical RDF graph issue, that is very difficult to reason about it. Maybe cogai chunks will make it practical.

dbooth-boston commented 3 years ago

Wow, atomspace has some really interesting ideas! I hadn't looked at it before.

mhedenus commented 3 years ago

RDF can implement property graphs, what RDF allows to describe is a superset of what a property graph allows to describe.

Well, I think that is currently not completely true, because of the issue I layed out: RDF does not provide the fundamental distinction of relation and property of the common modelling schemes.

The solution we implemented is simple: when reading RDF you must specify a model that tells you how to interpret the predicates. The question remains what to do with predicates that are not specified in the model. The rules are:

Now you have a clean mapping to a ER/class model: the URIs which are subjects belong to entities (domain model objects), predicates between entity-URIs are always relations and there is no relation from within a complex property that points to another entity. That means in terms of ER modelling: There is only a relationship between Alice and Bob but not between any part of them. One interesting thing is that you can decide post facto wether to interpret a predicate as relation or property.

This "blank-node-cluster" is like the "chunk" mentioned by @draggett ?

draggett commented 3 years ago

@dbooth-boston writes:

  1. Manipulate those chunks as indivisible objects;
  2. Compose a chunk from a subgraph -- i.e., "bless" it, to enable that subgraph to be manipulated as an indivisible chunk; and
  3. Decompose a chunk into its subgraph, i.e., access its parts.

These are all supported by my implementation of chunks with a scripting API as well as by the chunks rule engine. You can retrieve chunks directly from chunk graphs using the chunk ID, or associatively, using the chunk type and/or a subset of properties. More complex queries can be expressed as sub-graphs and interpreted by corresponding graph algorithms.

To mimic human memory, chunks are associated with a numeric strength, a timestamp and an estimate of their lasting utility. This is used to emulate the Ebbinghaus forgetting curve, the spacing effect, stochastic recall, and to balance short and long term demands on memories, e.g. working memory only needed for seconds or minutes, and long term memories that will be dormant over winter until next year's summer season.

You might ask why we would want to make computers forgetful, given how faithful they are at remembering. The answer is that in everyday situations you want to recall just what is most important based upon past experience. This is similar to web search engines which seek to provide the results that are most likely to be relevant given the words given in the search query.

The above behaviour can be fully controlled by the application developer as appropriate to the application requirements.

mhedenus commented 3 years ago

But are mixing up different things now...

namedgraph commented 3 years ago

@mhedenus Entity–relationship model - Limitations:

ER models are readily used to represent relational database structures (after Codd and Date) but not so often to represent other kinds of data structure (data warehouses, document stores etc.)

mhedenus commented 3 years ago

@namedgraph : this is a another more philosophical discussion. The inventor of the ER model Chen regarded it as the fundamental model of everything (and I agree).

We use RDF for integrating data from very different datasources. To do so each datasource must provide its data as RDF. To do so the data must be transformed. To do so you must do a model-model-mapping, i.e datasource's domain model to RDF.

Here is the crucial point, because that is not working well for very different reasons. It is way simpler and practical to map the datasource's domain model to ER and from there to RDF.

draggett commented 3 years ago

@mhedenus wrote:

This "blank-node-cluster" is like the "chunk" mentioned by @draggett ?

Yes.

RDF URIs are akin to words in English and other natural languages, in that they are important for semantic interoperability between communicating agents. Internally, agents need to be able to create IDs for chunks generated on the fly. In the human brain, chunks correspond to semantic pointers in noisy high dimensional spaces (the concurrent firing patterns across cortical columns). These are unique to each person. We are able to communicate because we have a shared understanding of concepts and their interrelationships, and are able to map semantic pointers to words.

RDF's blank nodes are equivalent to internal identifiers, which are clearly needed, but whose meaning is implicit in the graph structure.

afs commented 3 years ago

Another direction is "shapes" which describe the structure of the data. These are descriptions - they can be used for validation or they can be taken as definitions.

Both SHACL Compact Syntax and ShEx Compact Syntax provide a modelling view where relationship and attributes are more clearly identified.

Another aspect is that there is a tools role here to provide the view - not purely data format issue.

amirouche commented 3 years ago

@mhedenus

RDF can implement property graphs, what RDF allows to describe is a superset of what a property graph allows to describe.

Well, I think that is currently not completely true, because of the issue I layed out: RDF does not provide the fundamental distinction of relation and property of the common modelling schemes.

ref: https://github.com/w3c/EasierRDF/issues/45#issuecomment-823288745

What I wrote is whether one can do everything that GraphDB does with RDF: the answer is yes, it can even do more. I did not write how. If we consider GraphDB and Property Graph two differents things: a GraphDB is software, a Property Graph is concept. You can not with a GraphDB query by key-value pairs, such as: give me THE vertex with the uid=42. Unlike with my implementation of property graphs on top of RDF.

The solution we implemented is simple: when reading RDF you must specify a model that tells you how to interpret the predicates. The question remains what to do with predicates that are not specified in the model.

In my system there is no difference at the RDF level between items of SPO, they can all take the same types of objects, it is up the user to choose the schema even at that level.

The rules are:

* if the subject is a URI and the object is a URI the predicate is a relation unless specified otherwise

* if the object is a blank node the predicate is a property and the object is a complex property value (a structure)

* if the subject is a blank node the predicate is a (sub-)property of the complex property value (a structure)

I do not understand the last point, what is a sub-property?

Here is my approach:

Now you have a clean mapping to a ER/class model: the URIs which are subjects belong to entities (domain model objects), predicates between entity-URIs are always relations and there is no relation from within a complex property that points to another entity. That means in terms of ER modelling:

That is misunderstanding RDF to say "there is a clean ER/class model representation [in my upper layer on top of RDF]". RDF is built out-of relations, the basic nature is a network between entities where a link is directed and has a label. There is many ways to add properties to Bob or Aziz, or even add relation to Bob knows Aziz, unlike in a GraphDB, you can not relate to an edge, reifying the edge into a vertex and edges, hence introduce the metatype of hyperedge.

There is only a relationship between Alice and Bob but not between any part of them.

In you approach yes, they may be only one edge between two vertex, unlike my approach.

One interesting thing is that you can decide post facto wether to interpret a predicate as relation or property.

See above for an alternative.

mhedenus commented 3 years ago

@amirouche thank you for the reply. I haven't completly understood everything you've written but I have a feeling that we are getting closer.

First clairification of what I meant: Consider a RDF graph. Now you also have a (legacy) application that wants to import the data. Let's assume you have a Java application with a class domain model. Then you must do a model-model-transformation. If you're lucky there is a UML description of the application's model. You must match some sub-graph of the RDF graph to Java objects.

This mapping process includes a specific interpretation of the RDF data: some predicates are interpreted to be members of a the objects, some predicates are interpreted to be associations between objects (by the way: languages like Java and C++ also do not distinguish between association and membership, but this is another story).

I have been working on how to standardize these mapping process so that every (legacy) application can import/export data to from/to RDF. This is a very important practical thing. I call RDF "academic" because it seems to me that the RDF/Semantic Web world somehow ignores the reality that RDF must interact with existing applications (and please don't give me the RDFa story!) in way that the common developer can use it.

Again a picture. Objects or entity instances are identified by URI nodes. Some predicates therefore become associations/relations and other become properties/members. For properties you can draw an analogy to XML Schema. There are simple properties == properties that are coded as simple strings. I think we all agree that these are just RDF literals. There are also complex properties == structures like lists or maps. They are considered to be sub-graphs "starting" with a blank node. A restriction here is that loops in the complex-property-sub-graphs are forbidden and they must be trees. I call literals or other blank node attached to blank nodes "sub-properties". But the whole "blank-literal-sub-graph" is considered to be a member of the object. URIs showing-up in the complex-property-subgraph are also interpreted as properties not as relations.

image

mhedenus commented 3 years ago

@amirouche

Subject are always blank nodes, I use uuid4 as blank nodes. There is three reserved predicate symbols type, start and end Given a subject, start and stop can only be associated with a subject where there is triple (subect, type, "edge") Any other predicates are properties, where objects are any acceptable values.

The scheme you describe here seems to be a higher level of modelling, i.e. is this meant to be a graph "meta-model" ?

mhedenus commented 3 years ago

To narrow it down (sorry for bothering you but it is basically a very simple question) another example. Here is a domain model, the (naive) implemenation and RDF graph data.

Yes, you can say: why not annotating the the Java code similar to JaxB (e.g. using RDFBeans)?? Because this implies existing knowledge about the data and it does not answer the question: how do you see in the RDF graph what is supposed to be a property and what a relation ?

image

draggett commented 3 years ago

You may think of name, firstName and lastName as properties, but you could equally think of them as predicates. It doesn't make any real difference, and there's nothing to stop you classifying predicates as properties or links in an ontology.

amirouche commented 3 years ago

The scheme you describe here seems to be a higher level of modelling, i.e. is this meant to be a graph "meta-model" ?

Yes.

I haven't completely understood everything you've written but I have a feeling that we are getting closer.

Thanks for the feedback. I understand better the problem with the following:

Consider a RDF graph. Now you also have a (legacy) application that wants to import the data. Let's assume you have a Java application with a class domain model. Then you must do a model-model-transformation. If you're lucky there is a UML description of the application's model. You must match some sub-graph of the RDF graph to Java objects.

And the following:

I have been working on how to standardize these mapping process so that every (legacy) application can import/export data to from/to RDF.

Sort of an Object-Relational-Mapper (ORM) where instead of an SQL database, there is a RDF database. In other words, Map RDF concepts to Java concepts. In an ORM such as Hibernate, as of 2010, a Java class will describe a table where columns are described with annotations (IIRC), then the a row of the table will be represented as an object instance of that class, getters and setters to access column values. IIRC, SQL is also built with Java code using method chaining. FWIW, most of my experience is with Python ORMs, and I also built a Object-Graph-Mapper.

I call RDF "academic" because it seems to me that the RDF/Semantic Web world somehow ignores the reality that RDF must interact with existing applications (and please don't give me the RDFa story!)

I am an outsider of RDF or W3C. I came to RDF from Tinkerpop / Neo4J. Part of the reason I came to RDF is the academic thing that I prefer to describe as a lot of experience that are gathered in the same place with a lot of energy, an open process, a system that is well studied along various aspects, with several independent industrial implementations. RDF can prolly be perfected. Also, be warned that my system does aim to be 100% compliant with RDF! I cherry picked ideas (e.g. my system support SPARQL queries and Tinkerpop's Gremlin queries, they can mixed-and-matched)

in way that the common developer can use it.

FWIW, I do not think I match that description (e.g. I prefer to avoid ORMs), so take what I write with a grain of salt.

Is your goal to standardize read and write access to an RDF database, such as ORM do with SQL databases, in other words, build a framework to interop a RDF databases with Java re-using Java concepts?

If that is the case, I am not sure how it relates to this issue. Also quoting the other issue:

It turned out that RDF is not intelligible for the users and even not for engineers.

Who are the users? As far I as I am concerned catching up on 80% of what I know about RDF can be summarized with SPARQL, and this tutorial: https://docs.data.world/tutorials/sparql/.

My recommendation is to create a new issue with a specific question, e.g.: How to map RDF concepts to Java concepts?

namedgraph commented 3 years ago

@mhedenus ER models come from the RDBMSs which came along in the 70s or so.

The web on the other hand appeared in the 90s, and then RDF was designed for data interchange on the web. That's why it has URIs, the Open world assumption (OWA) etc.

So there is an inherent mismatch between those models, and trying to shoehorn one into the other will leave you with the worst of both.

To take full advantage of RDF you have to go fully in. Design your software around RDF, not the way around. Throw out the ORMs and pretty much all of the object-oriented layer. Accept that there are only triples (or quads), and they do not distinguish between properties or relations.

mhedenus commented 3 years ago

@draggett

You can create an ontology using OWL and define classes like Man and Woman. And you can make assertions about OWL-Properties like :hasWifeor :hasAge(see the OWL 2 primer for exactly this example). In PG-model or a UML model :hasWife would be a edge/link/association/relation and :hasAge would be a key/member/attribute. Can you define the difference in OWL ?

mhedenus commented 3 years ago

@namedgraph

The line of argumentation is very odd and completly unrealistic. Saying that RDF is younger does not make the other things worthless. Also the remark on OWA and CWA is out of scope, this is something completly different. By the way: SHACL showed up because the reality is that you cannot live without validation and CWA. That's why Stardog introduced ICV!

So there is an inherent mismatch between those models, and trying to shoehorn one into the other will leave you with the worst of both.

The opposite: the best of both!

Design your software around RDF, not the way around. Throw out the ORMs and pretty much all of the object-oriented layer. Accept that there are only triples (or quads), and they do not distinguish between properties or relations.

I want to live in your world, it seems paradise! ;D

Do you drive a VW or Audi? Then it is very likey that the software in your car's enigne has been developed here in Regensburg. Please come and tell these engineers to forget what they learned about UML and that they shall restart with OWL Tell the Java developers to forget about what they learned about model-driven-development and data-binding. They shall learn RDF and work with RDF4J, not this old-fashioned Hibernate stuff!

namedgraph commented 3 years ago

@mhedenus you haven't disputed that there's an inherent mismatch between the models. There's an impedance mismatch even between the relational and object-oriented models, that's why the ORMs have all kinds of edge cases.

I'm not telling anyone how to work, my concern is pushing the web to its full potential and making it data-driven by using RDF and declarative technologies. I am just sharing my experiences. We have explained them in more detail in our blog.

If you don't have time for that, at least take a look at a specification which enables generic REST APIs and makes web applications data-driven, or more specifically ontology-driven: https://atomgraph.github.io/Linked-Data-Templates/ It allowed us to get rid of the domain object model completely.

mhedenus commented 3 years ago

@amirouche Developing a Java-graph mapping would be the final result of what I want to discuss here. Before that conceptual questions must be answered. I used this thread because I consider my obviously weird question about relation/properties as the key issue. If you can unify PG and RDF conceptually then a Java-graph mapper can be the realization!

mhedenus commented 3 years ago

@namedgraph Looking at the links you provided that all looks great!

One thing should not be forgotten: We all want to advocate RDF! However, my experience is that there are many obstacles: technical, epistemic and cultural. We (my team) are struggeling where hard to explain to the management what data-centric means. You cannot believe how hard that is.

namedgraph commented 3 years ago

Amen! Do you know this book series BTW? Software Wasteland and The Data-Centric Revolution.

If you want to see our approach working in practice, drop me a line :) martynas [at] atomgraph.com

mhedenus commented 3 years ago

@namedgraph Thank you very much for the hints. The books look very interesting.