lphuberdeau / Neo4j-PHP-OGM

A doctrine2 style library to access neo4j graphs
156 stars 45 forks source link

Polyglot persistence support #90

Closed improved-broccoli closed 9 years ago

improved-broccoli commented 9 years ago

Currently, I setup polyglot persistence between an RDBMS (backed by Doctrine) and Neo4J (backed by this OGM). But I face a problem: When I persist an entity to Neo4J, the node Id is different from the row id in RDBMS. So the association across DBMS is broken. A solution would be to add an additional field which would contain the RDBMS row id in the entity class, but I think it is poor design. So how can I maintain association between entities in my RDBMS and Graph DB ? I think it could be an interesting feature, btw Spring already provide it : http://docs.spring.io/spring-data/neo4j/docs/3.1.5.RELEASE/reference/html/reference_cross-store.html

lphuberdeau commented 9 years ago

This is not an issue with the library.

I don't think there is a way to avoid this extra field.

improved-broccoli commented 9 years ago

I think I poorly explained. I opened this issue to raise a thinking about the usage of this OGM in a polyglot persistence context. I gave my problem as a start point. Actually, I found a solution, but I felt the OGM didn't help me at all. For example, there is too much method with private visibility in EntityManager class, and that prevents me to implement my own EntityManager based on the OGM one. I would like to know if somebody else did encounter the same issues and/or have the same feeling as me?

ikwattro commented 9 years ago

@jbenoit2011 Personally I don't think that an OGM should act as a PolyglotPersistence Consistency Manager. The primary goal of an OGM is to map to objects.

One point though, as @lphuberdeau mentioned, you can not avoid having the mysqlId on the neo4j node as a property, you need a reference somewhere and you can't force neo4j internal ids.

I did a lot of integration between Neo4j and other DBMS (primarly Oracle and MongoDB) and there is far too much logic to implement that should not be part of an OGM.

You'll have more flexibility to issue Cypher queries in order to maintain your states between mysql and Neo4j. A common behavior is to hook the postCommit of your doctrine events. Then it is all depending on how you want to assert the states between the two databases, here are a few points you may want to take into account :

Only four questions that prove that you need to build a strong system to maintain the state between the databases. OGM/ORM are cool, but not magic and the importance of such a system is out of the scope of an OGM.

My 0,02 euros ;-)

improved-broccoli commented 9 years ago

I think you're right @ikwattro , at the time of the issue opening I had difficulties to split up the problem.

Now, I found a satisfying solution by using Doctrine as a master and Neo4J OGM as a slave. All changes in MySQL are reflected in Neo4j and if something wrong happens EVERYTHING rollback, to avoid consistency issues. This works for the moment, still not production-ready but do the job.

Put this apart, there are still problems with the OGM. As I mentioned, too much methods are private in EntityManager, and even in a classic context (no Polyglot Persistence), that's limiting extensibility.

Another issue is the fact the OGM is relying on Neo4J internal node ids to identify them. But as you may know, these ids are volatile. So why not use a specific id inside the node instead?

Do you agree with that?

ikwattro commented 9 years ago

For the id stuff, this is a common issue in all programming languages. People use different strategies depending of their stuff.

At GraphAware we use a uuid plugin for Neo4j for production applications. Of course this forces the use of the plugin but it removes the burden of some logic, thus if the node has an uuid then we know that the state of the node is persisted.

On the other hand, for generic libraries whose target is the open-source community, you can not force the use of a plugin. What you can do is to use provide the ability to people to use the neo4j internal id or an option to pass an uuid that will be persisted as an indexed property on the entity. Here again, you need to make sure that this uuid is removed from the entity in case of rollback as the entity state will not be persisted.

Now saying that these ids are volatile should not avoid you to use them, in fact they are less volatile than a mysql id as in Neo4j the ids will never change during the state of a transaction while in mysql it can happen.

The problem of the ids is that the ids of deleted nodes can be reused in the database lifecycle, in your current situation this should not be a problem as you are not relying on these ids in mysql but the inverse. You need to make sure that the reference to the Neo4j ids are removed in your external reference systems (Elastic for e.g.).

Please note that NeoTechnology is aware of this problem and are working on an internal solution.

Now I don't see your point with the EntityManager, having private properties and providing well defined extension points is not bad design IMO. What extensibility do you need more ? Maybe you can raise a PullRequest with a Proof of Concept if this can be generic enough to help other people ?

improved-broccoli commented 9 years ago

At GraphAware we use a uuid plugin for Neo4j for production applications. Of course this forces the use of the plugin but it removes the burden of some logic, thus if the node has an uuid then we know that the state of the node is persisted.

I agree. Additional reading: http://blog.armbruster-it.de/2013/08/assigning-uuids-to-neo4j-nodes-and-relationships/ Maybe we should port neo4j-uuid plugin to PHP?

On the other hand, for generic libraries whose target is the open-source community, you can not force the use of a plugin. What you can do is to use provide the ability to people to use the neo4j internal id or an option to pass an uuid that will be persisted as an indexed property on the entity. Here again, you need to make sure that this uuid is removed from the entity in case of rollback as the entity state will not be persisted.

I agree.

Now saying that these ids are volatile should not avoid you to use them, in fact they are less volatile than a mysql id as in Neo4j the ids will never change during the state of a transaction while in mysql it can happen.

What do you mean exactly by 'they are less volatile than a mysql'? Do you have any further readings on that?

The problem of the ids is that the ids of deleted nodes can be reused in the database lifecycle, in your current situation this should not be a problem as you are not relying on these ids in mysql but the inverse. You need to make sure that the reference to the Neo4j ids are removed in your external reference systems (Elastic for e.g.).

Actually, I need to retrieve an entity with the same Id across all databases. That's why I'm considering UUID.

Please note that NeoTechnology is aware of this problem and are working on an internal solution.

Glad to hear that :)

Now I don't see your point with the EntityManager, having private properties and providing well defined extension points is not bad design IMO. What extensibility do you need more ? Maybe you can raise a PullRequest with a Proof of Concept if this can be generic enough to help other people ?

What I'm saying, is there is too much private methods (22/50) in EntityManager class and it prevents writing a custom EM. What I suggest is to set the visibility to public where it is possible, and to protected where it's not.

ikwattro commented 9 years ago

Porting the UUID to PHP can not work. You can't hook up the Neo4j transaction with php and get access to the TransactionData context.

For a library like here, you can easily use an uuid generator and hook up the prePersist method, unfortunately from what I see there is no persistException event or a like.

For the readings, you may want to look at these kind of articles : http://sqlperformance.com/2014/04/t-sql-queries/the-read-committed-isolation-level

Actually, I need to retrieve an entity with the same Id across all databases. That's why I'm considering > UUID.

You can still use an uuid constrained property on your entity if you use the library.

What I suggest is to set the visibility to public where it is possible, and to protected where it's not.

http://fabien.potencier.org/article/47/pragmatism-over-theory-protected-vs-private

I think that if you need a real custom EM, you'll tend up to just write your own and register it as EM in place of the one in the library.

improved-broccoli commented 9 years ago

I think that if you need a real custom EM, you'll tend up to just write your own and register it as EM in place of the one in the library.

I agree with the article written by Fabien Potencier. But It has the drawbacks to violate DRY and create an extra burden. And I am a lazy developer :)