skeyby / mysqlfs

MySQLfs is a FUSE - File System in User land - module which stores data in a MySQL database
http://andrea.brancatelli.it/category/tech/mysqlfs-tech/
12 stars 7 forks source link

Combining MySQLfs with multi-master replication to implement a clustered file system #12

Open vlasky opened 1 year ago

vlasky commented 1 year ago

I wonder if it's possible to combine MySQLfs with a Galera/Percona XtraDB Cluster (MySQL synchronous multi-master replication).

If it works, this could implement a clustered file system with guaranteed synchronous updates to all members of the cluster.

I haven't had time to try this yet, but I think it could be useful to others. In order to work reliably with a multi-master configuration, it's possible that MySQLfs could require tweaks to prevent caching of reads or to ensure that cache invalidation is triggered by any writes to MySQLfs database tables that occur on other members of the cluster.

skeyby commented 1 year ago

Hi Vlasky.

I haven't tried directly Galera/Percona XtraDB but I don't think there would be major problems in what you are proposing.

Actually we had a setup where we had a clustered filesystem shared among different machines with a standard replication, given that any host wrote in it's own directory (but anything could be read from anywhere).

If you want to try that just make sure that the lag introduced by the realtime replication doesn't hit very hardly the INSERT/UPDATEs that MySQLfs is doing, especially because a single (big) file could result in multiple INSERTs because the file is chunked in segments.

Let me know how it works out and if you step on any major problem we can try to understand how to tweak things.

vlasky commented 1 year ago

@skeyby thanks for your reply. When runing Galera/Percona XtraDB Cluster, each transaction needs to be synchronously acknowledged by all members of the cluster. The maximum number of transactions/second is limited by the highest round-trip latency between the cluster member that initiated the transaction and the other cluster members.

To avoid a write performance hit, MySQLfs would need to batch multiple INSERTs/UPDATEs together and execute them within a single transaction, otherwise each INSERT/UPDATE counts as a separate transaction and would incur a delay waiting to be individually acknowledged by the other cluster members.

skeyby commented 1 year ago

@vlasky Well, file writing is already wrapped in a single Transaction within a Begin / Commit block

(if you are curios, see https://github.com/skeyby/mysqlfs/blob/70773a0526d19cce3caf9c68df9d23901cee16b9/src/query.c#L952C5-L952C13)

What is not within the same transaction is the update of the stats and the reason for that is that, in general, stats updates are considered a best-effort action - any error on the statistics side should not rollback the writing of the file.

So, back to the point, writing a 50 MB file (...) will generate two Galera syncs. Probably you can live with that, I guess it depends on the usage scenario. Small big files = little impact, Millions of small files = huge impact.

Let me know.