maxpert / marmot

A distributed SQLite replicator built on top of NATS
https://maxpert.github.io/marmot/
MIT License
1.71k stars 42 forks source link

Detect transaction boundaries #25

Closed gedw99 closed 1 year ago

gedw99 commented 1 year ago

https://github.com/superfly/litefs Does it.

I would have thought that it would be useful to solve the linearisation design issues so that we don’t end up with partially incomplete data being replicated due to a transaction rollback on the initiator db

maxpert commented 1 year ago

Yep I wonder if people are read through https://fly.io/docs/litefs/how-it-works/ and the limitations around how LiteFS is playing transactions. I think there is some margin of improvement and I will be spending some time reading through SQLite codebase and LiteFS to make sure I integrate a full transaction capturing mechanism to have some sort of IDs within transactions.

maxpert commented 1 year ago

So after some deeper investigation turns out there is no clean way so far to detect transaction boundaries externally, even from log files unless they checkpoint. It might however start experimenting with some sort of locking mechanism built on top of NATS using the KV store they provide.

maxpert commented 1 year ago

I've been experimented with couple of things lately, and I am not sure if I like any of them so far. WAL log change detection, and generating a UUID to be attached to each change log. The problem with the approach is while I can group changes that are making modifications in transaction. This can also cause (due to OS Scheduling) non-transactional changes to be included in group as well. There is no clean way to do it at log level. I even tried SQLite forums.

This forked me ended up exploring NATS Key Value or stream options to have one writer writing changes at time, or use something like etcd to embed and acquire locks to transact.

While doing this exercise I realized this is fundamentally against the eventual consistent nature of Marmot. I am trying to put in locking mechanism when not having that goal was the first thing in my mind while I was building Marmot. I want Marmot to be closer to Cassandra rather than a strongly consistent store. The last writer wins is a defacto for a lot of data stores.

So after months worth of exploration, and coming back to where I started, right now I have decided to not include transaction level boundaries in the change logs.