Open remie opened 13 years ago
This has been kicking around in my mind for some time, but never surfaced because the requirement has yet to arise on a project. I haven't considered fully how this might work, but I think it could be a case of swapping the database connection during the lifecycle of a page.
When operating in the Symphony backend you would want all connections to be made to the master database. When submitting content via an event on the frontend you also want direct interaction with the master. However read operations (data sources) would read from a slave.
Sounds scarily complex to me.
Replication is always scary :) I've worked on projects where they used Hippo (Apache JackRabbit) or Umbraco, and it was always the case that there was a point where you had to wipe the entire slave instance and rebuild the datastore from revision 0 (or at least the last base revision);
So in order to implement replication, it is imperative that it is repeatable: you should be able to restore a database, start the replication from the slave instances and reprocess all steps.
The downside of PHP is that it is hard to use replication intervals. AFAIK there is no in-memory scheduler like Quartz which can either push (master) or pull (slave) changes. So it needs to be done at the same time it really happens and the visitor is wainting for it which makes it more scary...
I guess this is what my implementation of replication would look like in Symphony:
In case of an error on the Slave, perform automatic disable (maintenance mode / remove from loadbalancer) and notify the administrator. No sticky session required to be configured on the load-balancer.
In case you wish to enable User contribution from the front-end (forum, comments) it becomes a bit more tricky:
The best approach would be to enhance your Master instance with an site specific API. All front-end DATA events should then be posted to this API in order for the Master instance to process them. That way you still only have 1 database on which changes occur. The change will be propagated to the Slave instances using One-Way replication. In case of an error you can inform the visitor immediately.
If that is not an option, there is the conceptual implementation of Queued database synchronization:
The challenges with this approach:
Just some thoughts on a random friday :)
Oops... wrong button :)
/me sobs.
Because you hopped that I already solved it?
Heh, not so much, more because I hadn't considered it in this much detail before. It does sound rather like too much effort for very little gain. What is the probability of you using Symphony in an environment like this? Fingers crossed we've never had the need for master/slave databases, even on some pretty hefty operations.
Let's hope it is never becomes a requirement. That's why I've put it on the 2.0.0 release without due date :) Nevertheless, I do think it it could be a selling point for enterprise level customers. I've worked with clients that did not even want to consider a CMS without replication.
But the main point of working on this feature is because it is interesting to do :D
the main point of working on this feature is because it is interesting to do
Isn't that the reason to put any feature in? :-)
Thank you for your thoughts on this. I still haven't given CDI a good once-over, but it's creeping up my todo list. Great work so far.
The idea behind this is that, when dealing with large volume websites you might want to be able to load-balance your CMS. To do so, you need active replication.
The CDI extension should be smart enough to do this. It should even allow you to have a Master CDI instance for your structural changes (development environment) and a Master replication instance in your cluster. The Master replication instance accepts structural changes from the Master CDI instance and propagates them to it's connected slave instances.
Replication would simply mean that you will have the ability to list multiple MySQL databases to which every SQL statement is executed.
As expected, there is a tricky part: you need to make sure that all DATA is stored in a single master instance, from where it is replicated to the slave instances. This is easy for CMS content that it created by editors: simply only allow them to work on the Master backend. However, if your site has user-generated content, you need to make sure that all data is posted to the master instance. This might be quite a challenge!