orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.75k stars 871 forks source link

Edge with bidirectional labels (roles) for IN and OUT links #5568

Closed healiseu closed 5 years ago

healiseu commented 8 years ago

Hi, I am in a process of designing and testing both Document and Graph data models of OrientDB. It is clear that if one wants bidirectional links in Document model you have to write the code to manage them. Do notice also that you have to define also two properties of type Link, one at each Document to give you a name handle for accessing the linked Document. In your Graph data model there is the (Lightweight) Edge structure that automatically handles linking of Document instances.

Nevertheless I think, one can easily fall into the trap to think that Edge is unidirectional and treat it also like this. This is the case with RDF, Subject-Predicate-Object triplet. But in our case Edge is BIDIRECTIONAL by default. It is linked to both outgoing and incoming Vertices.

The unidirectional illusion arise because of the labeling of links, i.e. OUT and IN. Of course you could also have

But all this notation signifies simply the direction of navigation, e.g. we start traversing from TAIL to HEAD. You could also say that it is related to the Domain and Target of the master relationship, or Target and Domain for the Inverse relation.

Because we deal with bidirectional linking, we could easily reverse navigation, e.g.

Now let us define a pair of binary predicates e.g. isActorOf, hasActor and form the relationships

This way it is far more clear to understand that when I view Document records of ACTOR I can immediately see that this OUT link is the (isActorOf) part of the ACTORMOVIE Edge and when I view the MOVIE records I can understand that this IN link is the (hasActor) part of the ACTORMOVIE Edge. Needless to say that I think that those that are going to define two Edges to handle such a relationship are wasting both resources and make things complicate.

Therefore I think this enhancement, i.e. customized labels for IN and OUT links, will improve a lot data model design/use comprehension. In Topic Maps associative modeling these are the roles that objects, i.e. instances play in the association (relationship).

I am open and free for a good discussion and comments on this. If you decide to make this enhancement part of OrientDB please let me know your plans on its release.

smolinari commented 8 years ago

The names of the relationships are the names of the link lists in the documents. For instance, here is a view of the vertices in the Grateful Dead Concert database. You can see the "followed_by", "written_by" and "sung_by" link lists.

image

The names of the link lists are customizable.

Scott

healiseu commented 8 years ago

Sorry, this is not what I have been asking. Names of relationships is one thing, and the role names, i.e. IN and OUT labels of the participants in the relationship is another.

smolinari commented 8 years ago

But, if you have multiple relationships like in the example above, how would you name the groups of incoming and outgoing links any differently? The link set groups have to be named something very general and "out" and "in" are pretty good and simple. It would make no sense whatsoever, if they were to be named for any specific relationship. That simply wouldn't work.

Also, the names of the relationships are the names of the roles too. You could have an "owns" relationship and add a constraint so you get a 1 to n relationship.

I agree the abstraction of how graphs work could be made a bit simpler. But, ODB does graphs pretty well.

Scott

healiseu commented 8 years ago

@lvca sorry, I do not understand the bug label here. Is it possible to have additional names for OUT and IN. If NOT could you please think of an alternative solution that will allow me to view EDGE with different labels, i.e. master and inverse depending on which side of the vertex I start navigating.

Once more, to state my request succinctly :

As an example, think of the classic isTypeOf, hasType relationship:

May I add also that I am extremely interested in your Lightweighted Edges. I could implement something similar using bidirectional links on Documents but I will loose all functionality of your Graph layer.

healiseu commented 8 years ago

@lvka, @luigidellaquila, my question on how it is possible to assign different labels for edge or different labels for OUT, IN is too important for the framework I am trying to build on top of your database graph model. If it cannot be done now I would like you to add this as a feature at 2.2. release. Could you please confirm if that is possible ?

Thank you

healiseu commented 8 years ago

Further to my question, this is a typical response I am getting from your RESTful service

{
    "result": [
        {
            "@type": "d",
            "@version": 0,
            "@class": "TypeInstance",
            "in": "#23:1",
            "out": "#18:1",
            "@fieldTypes": "in=x,out=x"
        }
    ]
}

When I execute a CREATE EDGE command with a predefined LightWeighted Edge, that has the short name TypeInstance.

It looks like internally you are using these "in" and "out" labels. So would it be possible to let the user define and view hasType, isTypeOf instead.

That will make perfect sense also when you browse records at Studio where you will see hasType in the OUT vertex and isType of in the IN vertex.

@lvca and @luigidellaquila I have not got a reply from you on this issue. Hope you are making some wishful thinking on it ;-)

Kind regards Athanassios

luigidellaquila commented 8 years ago

Hi @healiseu

I just removed the bug label and replaced it with and enhancement one.

I think it's hard to introduce this change in 2.2, we are about to release the first beta and the list of new features is closed. Moreover, now OrientDB relies on conventions on property names to traverse the graph, so changing it would have a quite important impact.

We will take it into consideration, but I think we should open a discussion about this, IMHO property names in this case are just an internal detail, while edge class names should be used for design and application concepts

Thanks

Luigi

smolinari commented 8 years ago

I'd just like to mention my suggestion. Although I disagree with any fiddling of ODBs internal mechanics, I do agree abstraction is missing and in the end, I don't think we are too far off with the general conclusion as a request.

Please make building and using graphs in ODB more intuitive.

https://github.com/orientechnologies/orientdb/issues/5260

Scott

healiseu commented 8 years ago

Hi @luigidellaquila I agree with you, generally speaking the less you use names and semantics the better. Yes, it is a great idea to open a group discussion on data modeling issues, EXCLUSIVELY !

Hi @smolinari I am working on a prototype of abstraction according to the principles of my R3DM/S3DM high-level framework on data modeling. Hopefully I will make soon public the design and the schema in OrientDB, stay tuned !

healiseu commented 8 years ago

@smolinari, @luigidellaquila, @lvca regarding to bidirectional edges, and the labels for roles: Notice that I have already commented that you do not need to define two edges for the direct and inverse association. Following the example of @smolinari 5260 you have two classes Person and Tool and one association, OwnsBelongs.

If you create edges from Person to Tool then you have Person----(OUT)--------Owns-Belongs----------(IN)-------->Tool

If you traverse from Person, then it becomes Person -----OUT(Owns)------>Tool

If you traverse from Tool, then it becomes Tool <------ IN (Belongs)------- Person

Therefore Owns and Belongs are Roles as I have pointed out in previous communication. It is more or less semantics jargon here. Roles give more meaning to the OUT and IN labels of the edge and do not have to be defined in the internals of OrientDB. Unless they become an important part of a different query system...... Such an example is the Freebase Graph Query Language, MQL !

smolinari commented 8 years ago

@healiseu - Now I understand your suggestion better. Thanks for the extra clarification.

I would have said, to do the above, one would need to make two normal unidirectional edges (one2many) with two different classes (owns and belongs), one for each direction, since the "roles", as you call them, are different. A normal bidirectional edge should be something like "friendOf" or "follows", where two vertexes could cross/ use the same edge class.

Scott

healiseu commented 8 years ago

@smolinari EDGE in OrientDB is always bidirectional, I think you know that. What you probably suggest is two unidirectional LINKS but you loose OrientDB Graph functionality. If this is not what you suggest read the bottom line. Notice that here we are talking about the direction of traversing, i.e. navigating, in my opinion this is what all this discussion is about. That is why I insist to allow displaying OUT and IN with different labels when you browse with STUDIO or on the Graph canvas. Semantically when you present this to someone he will be able to read the path you are following much easier. Let me try to illustrate this with another example:

Normal Direction (Hanks) --- isActorOf ---> (Catch me if you can) <--- hasDirector --- (Spielberg)

Reverse Direction (Spielberg) ---- isDirectorOf ---> (Catch me if you can) <--- hasActor --- (Hanks)

Please notice carefully that the direction of arrows remains unchanged. There are only two edges here. You may loose the arrows now to avoid the confusion !

(Hanks) --- isActorOf --- (Catch me if you can) --- hasDirector --- (Spielberg) (Spielberg) ---- isDirectorOf --- (Catch me if you can) --- hasActor --- (Hanks)

Bottom line if you simply create a pair of bidirectional edges to indicate direction, i.e. to represent a binary directed association, is wrong design in my opinion because you add unnecessary complexity to your model.

smolinari commented 8 years ago

Yes, a standard edge in ODB is bidirectional, but you can construct edges to be unidirectional. So for our example, you can do the following.

orientdb> CREATE CLASS Person EXTENDS V

orientdb> CREATE CLASS Tool EXTENDS V

orientdb> CREATE CLASS Owns EXTENDS E

orientdb> CREATE CLASS Belongs EXTENDS E

orientdb> CREATE PROPERTY Owns.out LINK Person

orientdb> CREATE PROPERTY Owns.in LINK Tool

orientdb> CREATE PROPERTY Belongs.out LINK Tool

orientdb> CREATE PROPERTY Belongs.in LINK Person

So, if you want to do a query for all tools belonging to a certain person "Bob".

SELECT * FROM ( SELECT EXPAND( OUT('Owns') ) FROM Person
          WHERE name='Bob' )

You can also add a constraining index, so only one person can own a tool at a time or that one tool can belong to any one person at any time.

So the ability is there in ODB to do multiplicity and direction as needed and it clearly works and makes some sense. It is just a matter of making it simpler. For instance, why can't the above query be,

SELECT *  FROM Tool, Person WHERE Person.name = 'Bob' ON EDGE 'Owns'

In the other direction it could be...

SELECT *  FROM Person, Tool WHERE Tool.name = 'Pliers' ON EDGE 'Belongs'

Or something to that effect? I just made those up and have no idea if they are feasible or make proper sense to anyone else. But, they seem a lot simpler to me. The fact I'd have to "EXPAND( OUT() )" or "EXPAND( IN() )" to use the right edge and direction is cryptic to me and I would never have known it, without looking deep into the docs. (and I am still unsure about how to use those methods tbh).

So, I think we are wanting the same things, just that I don't believe in trying to come up with the solution within the internals of ODB. That is the ODB's team to decide. I just want to let them know, and I think that is your goal too, their userland facing SQL could be improved. And to be brutally honest, I might even be wrong on that too. As one other person said to me, ODB is flexible due to how it is built. But, would we be reducing flexibility by improving the syntax? I don't think so.

But I do know, I am still having a hard time swallowing how to setup and query graphs in ODB and I can't imagine I am the only person. In fact, I know I am not the only person, because I see a lot of questions asked about how to properly query ODB for graph relationships.

So, a goal of simplification of ODB's SQL would be welcome, at least from my perspective. Again, no expert here, and I'd be open for any discussion on the contrary for sure.

Scott

healiseu commented 8 years ago

@smolinari please read my bottom line above ;-) But you touched a very, very thorny issue here :

So, a goal of simplification of ODB's SQL would be welcome

Have you or any of our readers here heard about Associative Technology? Have you played with QlikView? Have you seen a demo of AtomicDB or SentencesDB ?

But let's get back to your example and generalize a bit. If you only want to find what things from a class are related to things from another distant class then the perfect simplification would be to specify only the Start and End class and filter accordingly with the items you want to view from these classes. The hidden power of the associative graph and the engine behind should automatically handle paths and relationships for you !

Welcome to the future, this is called associative technology but the exact details on how each vendor implements this are hidden on proprietary and patented stuff !

So @lvca saying this, what can be the future of OrientDB ? In my opinion you have to embrace this technology and possibly make it an integral part of your system. But if you are really interested in that direction we can discuss it in private sometime ;-)

smolinari commented 8 years ago

We have ClikView where I work, but I haven't gotten a license yet. Haven't looked at AtomicDB yet.

What if I have more than one relationship between the two classes? I must also include the name of the relationship(s) in some way, in order to get the right results, wouldn't I? It seem to me, with start and end class declaration, ODB wouldn't know which relationship I actually want.

Thinking about multiple relationships, maybe this could be possible too.

SELECT * FROM Class A, ClassB WHERE some.property = 'something' ON EDGES 'EdgeX', 'EdgeY'

Scott

healiseu commented 8 years ago

You do not need a license for a personal edition of QlikView. It is free for personal use with some limitation on importing but you can fully test anything. For what you asked, think that you already have names for the classes and names for the properties. But most developers, architects do not take advantage of these. One of the problems in data modelling design is that they do not realize that the properties have to live outside the classes. This is an appetizer. Read my LinkedIn posts please ;-) As you will see in QlikView there are NO LABELS connecting the various classes, i.e. entities. Nevertheless you get everything from anything !

smolinari commented 8 years ago

This is true in the RDBMS world, as there are no "entities" for relationships. Relationships are made simply by "joining" tables through foreign keys. In a graph world, the relationships, the edges, are entities. So, you must have some sort of label to point out the entities(relationships) that are actually wanted.

But, if you want, please give some examples of SQL querying for two vertex classes without using the names of the edges.

Scott

healiseu commented 8 years ago

You are provoking me my friend ;-) Associative technology is based on Graph to start with. QlikView internal engine and associative technology in general is NOT using joins as in RDBMS. They are using common attributes, (properties) that are connected to entities. You may think all of them as classes. All of these classes are associated, i.e. linked bidirectionally and for the functionality we mentioned above you only need a SINGLE edge. Therefore to answer you question, you only want to see that your classes are connected with that single edge. I leave the rest for you to search....

smolinari commented 8 years ago

Not provoking. Wanting to learn myself. I would gladly say a query with only two named classes is better than a query with two classes and an edge. I just can't imagine how that could work.

ClikView is a data-warehouse/ BI and analytics solution and as such, it has a different way of collecting data. After reading this article

http://www.dbms2.com/2010/06/12/the-underlying-technology-of-qlikview/

I don't think you can bend ODB to do what Qlikview does and we shouldn't want to either. From what I am reading and understanding, the technology behind Qlikview is most definitely not graph based. Like what the article says here:

With that out of the way, let’s turn to some highlights of QlikView’s underlying technology. For the most part, QlikView’s in-memory data structures are quite simple. In particular:

  • QlikView data is stored in a straightforward tabular format.
  • QlikView data is compressed via what QlikTech calls a “symbol table,” but I generally call “dictionary” or “token” compression.
  • QlikView typically gets at its data via scans. There is very little in the way of precomputed aggregates, indexes, and the like. Of course, if the selection happens to be in line with the order in which the records are sorted, you can get great selectivity in a scan.
  • One advantage of doing token compression is that all the fields in a column wind up being the same length. Thus, QlikView holds its data in nice arrays, so the addresses of individual rows can often be easily calculated.

Again, I'd love to be told I am wrong. But, I need more meat to bite off and chew than you are giving.

Scott

healiseu commented 8 years ago

I don't think you can bend ODB to do what Qlikview does and we shouldn't want to either

You shouldn't what, can you explain that ? How does that affect you the user anyway ? This is all about abstraction layer on top of OrientDB. Abstraction following the principles of R3DM and taking advantage of associative technology breakthrough concepts, such as .....

smolinari commented 8 years ago

Because Qlikview has an in-memory only database with token storage, which is built mainly for read-only operations. It is built much more for OLAP and not OLTP, which ODB is built for.

Getting back to simplifying SQL. Here is a perfect example of what I mean. found in another thread, where the user was "stuck" looking for the right SQL to get the right results.

select expand(out('BELONGS_TO')) 
from (
    select expand(out('follows')) from User where @rid = #13:0 
) WHERE
OUT("FALLS_INTO") in [#32:0, #32:1] 

How about this?

SELECT * FROM User, Post, Category 
WHERE User.@rid = #13:0 AND Category.@rid in [#32:0, #32:1] 
ON EDGES 'BELONGS_TO', 'FOLLOWS', 'FALLS_INTO'

Scott

healiseu commented 8 years ago

It is built much more for OLAP and not OLTP, which ODB is built for

Exactly, and this is the jump on the curve point, i.e. marketing a product that runs at the same time as OLAP and OLTP. This is what AtomicDB achieved. I have researched, studied and tested their database and model. I have ported their API on Mathematica and I posted at LinkedIn. In principle it works, they have several problems to solve, but it works. As an engine it can do more in terms of functionality than QlikView and most important they read/write on disk !

Thank you for your queries, I will do comment on them as my work progresses.....

smolinari commented 8 years ago

This is now what the query in the thread I mentioned above ended up as.

select from (
    select expand(out('follows').in('BELONGS_TO').asSet()) from User where @rid = #13:0 
) WHERE OUT("FALLS_INTO") in [#24:1, #32:1]

Holy moly Batman! How long would someone (like myself LOL! :smile:) need to study OSQL to get to the point, where that query makes sense and more importantly, can come up with it on their own quickly without reviewing the docs for reference? I still don't know what the user actually wanted from the discussion and I couldn't tell from reading that query for sure. It could just as well be Chinese. And I would never have found out about asSet() because it isn't even in the docs.

Let me take a shot at possibly simplifying this query, despite me not even understanding it really. Maybe I might understand it after this exercise.

SELECT * FROM
    (SELECT * FROM User
      WHERE @rid = #13:0 
      ON EDGES 'follows', 'BELONGS_TO' 
      AS SET) 
WHERE ON EDGE 'FALLS_INTO' in  [#24:1, #32:1]

Does that make any sense? Is an "AS SET" really necessary? I don't know, because there are no docs on what its purpose is. I can imagine its purpose, but if I query for a group of results in a subselect, I would assume the results are a set to begin with. LOL!

Correction! I found asSet() in the docs under methods, not functions. So I would guess it is necessary, as a way to make sure the items in the results are all unique? Why can't it simply be a part of the language and not a method. I mean, the method doesn't require any arguments, right?

I hope you can tell, I am just poking a little fun at OSQL (or myself). It is early in the morning and I don't have much coffee in my brain. It may be, no, probably is, my own incompetence that makes OSQL look so cryptic to me. I'd just like to know if I am alone on this. So, if anyone could chime in to defend OSQL's methods (of madness :smile: ) or agree with my perspective that OSQL could be improved, I'd appreciate it. :smile:

The bigger question I have is, how can ODB change its OSQL and not blow up everyone's programming and not bloat up ODB unnecessarily.

Scott

healiseu commented 8 years ago

@smolinari, @luigidellaquila, @lvca I urge you to read my posts and see the demo of ATOMICDB API at my LinkedIn page. I ported and tested their API on Mathematica. It works, it is filtering the result set returned from their Associative Graph by setting parameters on the API commands, same commands, same parameters all the time. Are you still looking at simplifying the query language, why not embracing this technology ? I believe these guys at QlikTech are doing a similar thing behind the scenes.

What I have experienced is that It is all based on Hypergraphs, i.e. hyperedges connecting multiple nodes that have single instance values. They are playing and moving the pointers, i.e. links, not the values (instances). Entities (classes) are sharing common attributes (classes) among them and use them as bridges for navigation, query, i.e. filtering purposes. This is THE BIG answer to your question, although I am not suggesting changes to OSQL. Implement a function with standard parameters instead, for super fast filtering on the leaf nodes of the graph.

Think about the huge market impact you can make with this great feature on your system. Stay tuned ....

smolinari commented 8 years ago

The big answer to whose question? Which question?

Scott

healiseu commented 8 years ago

The bigger question I have is, how can ODB change its OSQL

smolinari commented 8 years ago

Where are some docs for AtomicDB's language or API?

Until I can see the actual API or language AtomicDB uses and how to query, and more importantly, how to manipulate data with it, I don't think we should be making any kind of comparisons. From what I can tell, AtomicDB is purely made for analytics. OrientDB is made for storing relational state. There is a big difference.

No doubt, and I'd say we agree on it, that ODB's language could be simplified. Making ODB work completely different isn't the right option to get that goal done though. The associative technology AtomicDB uses isn't really graph technology. It isn't a graph database. There are triples involved, but not in the same sense as a graph with developer defined vertices and edges. I wouldn't call "the bridges" an edge either. All the reading I've found is talking about ETL and data warehouse usage and although ODB has extras for ETL, it isn't purely built for analytics.

In other words, you are asking an apple to be an orange.

Scott