Support of MetaSchema in OrientDB

PhantomYdn commented 8 years ago

Guys,

There is proposal to implement support of MetaSchema in OrientDB. Please take a look into the following document: https://drive.google.com/file/d/0BxzvQixnNXlccExLRmtGZUxQbms/view?usp=sharing

Short description: MetaSchema is an ability to define schema for schema entities: classes and properties. There are several use cases listed in presentation. Also MetaSchema assumes support of MetaLinks: links from common documents to classes or properties.

Impact: As I know classes and properties has already stored as documents. But those documents doesn't have class defined. So this request will require schema storage redesign, which shouldn't be huge. All other functionality is new ones: so no impact to existing behavior.

What do you think? If you agree, please help with the following 2 questions:

1) What additional use-cases do you see? 2) What extra impact to existing OrientDB functionality it might be?

P.S. Btw, it's related to the very first my issue here for OrientDB: #2521 :)

almibe commented 8 years ago

Adding first class support for querying a database's schema (use case 5 in the linked document) goes a long way to make it simpler to support situations like an application with a plugin system that shares a single database.

smolinari commented 8 years ago

@PhantomYdn - can you better explain the metalink concept? I am not getting that one part myself and would like to understand it very much.

I am also very interested in the deeper use of metadata. It is basically essential within an extensible multi-tenant solution, similar to what @almibe said.

However, I don't see an issue with using ODB itself for our own metadata use cases. Yes, that means we have to do the work. But, I would imagine our solution wouldn't necessarily be useful for anyone else or rather, it would be very specific for our needs and thus it would stay proprietary. There are multiple ways to do multi-tenancy. I am not sure ODB could come up with a system generic enough to be useful in all use cases and yet extensible enough for our own needs. And if they could and we need to extend a generic metaschema system anyway, we are back to application level work again. So, I say, just use ODB itself for metadata tracking and querying at application level.

In other words, schema changes aren't high-occurrence activities on a database (well, they really shouldn't be). So, it wouldn't be too difficult to implement a parallel system to track schema manipulations at the time schema additions and changes are made at application level.

Question to the ODB team. Could a schema tracking/ metaschema system be implemented as an ODB plugin?

Scott

PhantomYdn commented 8 years ago

@smolinari , MetaLink that's link (or LINKSET or LINKLIST) to class or properties. There are verity of use cases especially for MetaLinks, let me mention few: 1) Example of MetaLink to class. You are building CMS. And you have class called "UIContol" to store instances of "buttons" in your UI. One type of button is for creation of "some documents": so your control instance should have link to other class to show what type of documents should be created. 2) Example of MetaLink to property. You are building Extract-Transform-Load system (sub-part of data warehouse suite). So you should have ability to define mapping between source fields and target fields (properties) in the system.

MetaLink here is not a multi-tenant case when you have DB links to other DBs (but this is also cool feature to be implemented from Oracle world).

And I definitely argue against your statement that "schema changes aren't high-occurrence activities". That statement comes from RDBM world where schema should be normalized and pre-defined. But if look close, we will see tons of applications which build their own "meta-model" to workaround that approach - to take ability to change schema to application layer. ODB already has very cool features on DB layer: multi paradigm, multi-to-multi relationships (graphs) and security/users (99% of apps which use RDBM has just 1 user to access DB and their users "represented" on app level and not DB). And I think that MetaSchema/MetaLink functionality is next step for OrientDB to build a cool and solid stack for everyday use.

I'm from the world where schema changes is common operation. Most common use cases listed in that presentation. Please pay attention to first use case: that's the main scenario. And I think, that @almibe 's case is an additional example for UC#1.

smolinari commented 8 years ago

Don't get me wrong. What I mean by high-occurrence activity is something like dynamic content creation. Like comments to blogs. You don't have user's changing CMS buttons with every page requested of an app, for example. Or users creating new fields or new classes with each usage of the app. These things are done much less frequently.

I do agree, ODB's features allow for a very flexible application. That is why we have decided to use ODB too. We really need flexible schema and we need relationships. ODB is a dream come true in that respect.

I'd be so bold to say, a metaschema system can be programmed at application level relatively easily. I question the necessity at DB level or rather, the capability to have the necessary features in the DB and it be practical and useful for most users. Most users won't need it or require it.

In the end, it is the same kind of question often posed for programming language frameworks. Should it be part of the framework or something made in "userland". I am on the fence as to the answer, to be honest. I understand the want for metaschema. I just don't think there is enough need for it and that drives feature design decisions.

If I think about it, ODB has enough to straighten out with the features it has. So, if this ever became something the dev team would tackle, it should only be done, once ODB's current feature set is absolutely rock solid and extensive. Things I'd rather see more attention on first is the Lucene indexing, for example.

Scott

healiseu commented 8 years ago

Ilia, I have already started implementing such a schema in OrientDB ! I will avoid the meta-terminology definitions because in my opinion they create confusion. And the same goes for ontology. Let me explain my case using RDBMS jargon.

We have Entities, Attributes and Values. If you represent Attributes with OrientDB Properties (schemafull) or Fields (schemaless) constructs then you end up with handling records and relations, i.e. sets of records and your fundamental unit for processing is the record. You have Entities, i.e. OrientDB classes that are tables, and you end up with the process of "joining" tables. Because of the Graph layer and bidirectional linking with edges, technically speaking it is not the same as joining but conceptually is the same because you are managing relations, sets of records. That approach cannot cope with the present needs of aggregating various data sources and be flexible at the same time on any different view, perspective that the user wants to have on the hub of data.

The real power of Graph databases is revealed when you change the fundamental unit of processing from record to value, i.e. the cell. To achieve that you need a model that is close to triplets, i.e. SPO, EAV or as it is known as RDF graph data model. But this is not the model I am speaking about. The data model I have in my mind is that of an associative database.

I will not go deeper in that post, I reserve my time to write a LinkedIn article on this and analyse it fully there. All I can say at this moment is that what you call Metaschema, the way I see it in my model, associative model, is a conceptual layer, a conceptual framework of generic concepts, such as Entity, Attribute, Value and more specific like Person, Price, and Integer, that is used for

Consolidating Entities, Attributes and Values from other data sources.
Retrospection
Multiple type inheritance at the instance level (Freebase case)
Other......

And yes, you are absolutely right, it does make a huge difference and impact working with such a schema already predefined and ready for your users.

Keep in touch I think we have a lot in common to share on this

PhantomYdn commented 8 years ago

@smolinari , I fully agree with you that it seems to be "not common case". But from other side it might be "not common" just because there is no solid solution for that in place;) For example: before SQL it was hard to imagine that somebody will need special language querying instead of programming algorithmicaly query for every case. The same I see here: I'm from enterprise world and I do see how absent of this feature limits different cool solutions. But monsters like SAP, Oracle, Amdocs, NEC are brave enough to invest a lot to build metaschema layer on top of RDBM which allow them to build lots of cool features on top of that: BPM, ETL, xRM and etc. I'm working on Orienteer - which intended to be Open Source SAP :)

Let me also mention, that I don't see huge impact for ODB: almost everything already there. ODB stores classes and properties as documents already. So just add classed to those documents and add generic "metaclasses" for classes and properties which just reflect existing fields on those documents. All other functionality is "add-on": so no impact on existing functionality.

@healiseu , thanks! Will wait for your article. But please help me understand what option do you recommend: 1) Implement metaschema on DB layer 2) Or implement it on application layer (classic approach for RDBMs)

healiseu commented 8 years ago

@PhantomYdn, the way I see this is that you use DBMS logic layer, i.e. classes, or Vertices and Edges of your DBMS to build on top of it that conceptual framework. Needless to say that you can transfer this design to some other DBMS provided it suits your needs, e.g. noSQL graph database. There are two ways to implement this:

Bootstrap, i.e. generalize from Core classes See Freebase, SentencesDB where they start with only one or two classes and generate the schema as instances of these classes. For example in SentencesDB you have Entities and Associations tables, in Freebase you have this GraphD primitive tuples in some table. In that sense, the "metaschema" is generated and described from this core schema.
Create Abstract Classes, other classes necessary to implement some upper-level ontology (schema) that is generic enough, i.e. I am thinking of something similar perhaps to schema org.

On the long run (1) might be more flexible but as as start it is probably more difficult to manage. I have started with (2).

PhantomYdn commented 8 years ago

@healiseu , I see - thanks for the links - will take a look more seriously to SentencesDB - sounds interesting! Yea - there are few approaches to build meta-model on top of DB, but for relatively serious cases it becomes real investment into development of meta-model which actually will be needed just for you: it can't be reused. But if it will be implemented on DB layer (point 1 and intent of the proposal) - it will be, you are right, long run, but big game-changing feature...

healiseu commented 8 years ago

@PhantomYdn where can I contact you for a chat ;-)

PhantomYdn commented 8 years ago

@healiseu , my skype is <deleted> :)

PhantomYdn commented 8 years ago

@tglman , @laa , @lvca , what do you think about this proposal?

tglman commented 8 years ago

Hi @all

We usually some query from around select from metdata:schema for solve this problem i can make few exaples:

all the class that link to one:

select from (select expand(classes) from(
select from metadata:schema)) where properties.linkedClass in :targetClass

or simpler all the classes:

select expand(classes) from( select from metadata:schema)

or all the properties of a class

select expand(properties) from (select expand(classes) from(
select from metadata:schema)) where name = :className

so probably is not needed to define new structure from scratch, maybe just define some alias to that queries, or just a dictionary of this kind of queries.

PhantomYdn commented 8 years ago

@tglman, queries are just small use case of beneficials of this proposal. Please check other use cases in the presentation.

smolinari commented 8 years ago

To me, the presentation needs to be more defined on how such a metaschema system would be beneficial. For instance, the use cases could entail examples queries that would be available, if such a system would exist.

Scott

a-unite commented 8 years ago

We had to implement all metaschema caching and browsing through our own (specific) API too. But I agree, that thinking in terms of metaschema (classes as entities) might be quite important for further development.

Just as another reason (despite what was discussed in https://github.com/orientechnologies/orientdb/issues/2521, I mean consistency and easiness): queries to metadata:schema are not so fast, by the way, when you have thousands of classes since all schema stored as one doc and indexes couldn't be applied.

orientechnologies / orientdb

Support of MetaSchema in OrientDB #5642