kvesteri / sqlalchemy-continuum

Versioning extension for SQLAlchemy.
BSD 3-Clause "New" or "Revised" License
568 stars 128 forks source link

Support for multi-master database setups #313

Open gertjanvg opened 1 year ago

gertjanvg commented 1 year ago

Hi all!

For a project we have tried to integrate this package for versioning our database data. During development things were fine, but it turns out that in production we run into a variety of issues. We narrowed this down to the fact that we are using a multi-master MariaDB Galera database, which has a lot of impact on this package in particular.

SQLAlchemy-Continuum uses a Transaction table to record all transactions and refers to this table in the _version tables. At first I thought that the id of the Transaction uses the AUTO_INCREMENT value, but instead it uses the transaction_id_seq Sequence which should generate sequential ID values as primary key. By default a MariaDB sequence uses an increment of 1, which breaks on a multi-master Galera cluster -- we got a lot of duplicate primary keys when serving multiple requests at the same time. It is also possible to set this increment value to 0, in which case it will interleave the generated values, just like it does for AUTO_INCREMENT values.

I might be wrong on this, but some testing on a simple 2-node Galera cluster showed me that these generated values are unique, but are not sequential: we no longer have the guarantee that the highest value is the latest generated value. This in turns breaks the next/previous version queries in Continuum, essentially rendering the plugin useless for our purposes.

Theoretically it would be possible to use exclusive locks in our application code to not work on the same database row simultaneously, but that opens up a whole other can of worms called deadlocks. As such we would like to avoid this approach as much as possible.

Would it be possible to start adding support for this, or is there something else we are overlooking that would resolve the issues?

marksteward commented 1 year ago

Does autoincrement work fine for Galera? I've been considering getting rid of named sequences as it is.

gertjanvg commented 1 year ago

It works fine as in it will not have duplicates, but there are some caveats in the particular application within SQLAlchemy-Continuum. The generated values are unique, but not guaranteed to be sequential.

From https://mariadb.com/kb/en/tips-on-converting-to-galera/#auto_increment:

Bottom line: There may be gaps in AUTO_INCREMENT values. Consecutive rows, even on one connection, will not have consecutive ids.

Given the implementation of getting the next/previous version this would break the plugin as it is.

marksteward commented 1 year ago

Why is having gaps in transaction IDs a problem? The current version code shouldn't care, it's only important that they're increasing.

gertjanvg commented 1 year ago

The gaps are not really the issue, true. However, since it's possible that two cluster members are trying to add entries at the same time it could occur that the latest added entry (time-wise) does not have the highest ID.

Say we have two Galera nodes, node A generates values 1, 3, 5, 7... and B values 2, 4, 6, 8... If we start a transaction on A, insert 2 rows they will get the IDs 1 and 3. Then, with the transaction on A still opened, we add a row on B which gets ID 2. The transaction on A commits, and shortly after the one on B commits. We now have IDs 1, 2, 3, but row 2 being 'newer' than row 3.

This breaks the assumption that IDs are increasing, as you'd go from version 1 -> 3 -> 2.

marksteward commented 1 year ago

Without some sort of synchronisation to provide a stable transaction order, that will always be a problem.