Open szabba opened 2 years ago
There are two solutions I can think of right now. One can be a special column that people can add into their tables to get updates on. Other could be a table, where I can put these updates. I am very much interested in hearing what usually people will opt-into more.
So the problems nature is federated transactions across all the nodes ? Marmot then needs to undo a transaction due this. Ring a type of 2 phase commit pattern ?
How would the event loop / flow sequence would look like in a 3 node setup for example ?
@maxpert I think, as a client I would prefer:
Since this is solving an issue that is pretty Marmot-specific by nature, I would be fine with the application having a connection to the locally running Marmot process. I imagine this connection only being used by clients that need this, and only for notifications from Marmot - not for any write operations.
Another concern is whether Marmot is undoing entire transactions or changes to individual rows. In most use cases that comes to my mind, I would want to know that either all the changes I have made in a transaction were accepted or rejected by the cluster. If Marmot can provide that, my life is easier. If it cannot or does not want to, that is fine to - but I, as an application author, need to know that to take it into account when I make updates to the data.
Fair enough. I think NATS fits the bill for notifications pretty well, so once a node commits entry to change log stream, it can also publish a message on NATS which you can easily subscribe. This will help you with any websocket use-cases as well.
Another concern is whether Marmot is undoing entire transactions or changes to individual rows. In most use cases that comes to my mind, I would want to know that either all the changes I have made in a transaction were accepted or rejected by the cluster. If Marmot can provide that, my life is easier. If it cannot or does not want to, that is fine to - but I, as an application author, need to know that to take it into account when I make updates to the data.
Right now there is no cross-process mechanism that SQLite provides that if row change is part of a transaction or not. Without that we will only be guessing. So individual rows are added via triggers to change logs, but we don't know if it happened as part of batch transaction or not. A quick Google tells me trigger level statement support doesn't exist.
At this time SQLite supports only FOR EACH ROW triggers, not FOR EACH STATEMENT triggers. Hence explicitly specifying FOR EACH ROW is optional. FOR EACH ROW implies that the SQL statements specified in the trigger may be executed (depending on the WHEN clause) for each database row being inserted, updated or deleted by the statement causing the trigger to fire.
I might have to look a little bit further to see if there is a way to detect that (I am guessing WAL should have something). I will need to look deeper into SQLite code which is time consuming. Right now I (for v1) am keeping it focused on individual rows because quite a large percentage of use-cases can be used with single row changes (Non-transactional cases like CMS, Cache etc.)
Right now I (for v1) am keeping it focused on individual rows because quite a large percentage of use-cases can be used with single row changes [...]
That is completely fair enough - and it's worth documenting explicitly.
I think NATS fits the bill for notifications pretty well, so once a node commits entry to change log stream, it can also publish a message on NATS which you can easily subscribe. This will help you with any websocket use-cases as well.
Supporting NATS might be a good idea. But requiring NATS for this use case might pull in more that's needed for this use case.
I'm not really sure: I don't have experience with NATS. My mental model, from reading on it, is that it's a messaging service with at-most-once semantics by default.
This is a little bit pertinent in that it has loads of examples of what people build on top of marmot.
https://github.com/ThreeDotsLabs/watermill
it’s API is maybe a useful reference too.
It’s does the CDC / WAL on sql servers .
the other thing is that it’s re Utrecht / recursive. So an even is published out of DB A, which is consume red by Service B, which does sone mutation to its state ( maybe a DB or really anything ) causing it to pu lost an event .
It’s turtles all the way up in this case.
—-
The issue about linearizabilty ( that’s hard to write on a mobile ! ) and row level versus transaction level is covered by similar systems like Wal-g.
https://github.com/wal-g/wal-g/tree/master/internal/walparser
Their architecture is quite well established and it can do what marmot does as well as PITR, etc. Marmot and Wal-g are complimentary. Wal-g is also doing Wal parsing but it’s not publishing for others higher up the stack to consume.
That is completely fair enough - and it's worth documenting explicitly.
@szabba absolutely!
I'm not really sure: I don't have experience with NATS. My mental model, from reading on it, is that it's a messaging service with at-most-once semantics by default.
I hear you! Good part is NATS supports WebSockets out of box! But I see your point very clearly now. If I am hearing it clearly the ideal solution in your head will be that you are blocked until your changes are actually replicated to cluster. I can tell you rqlite easily fits that bill because it won't return the call until changes are applied to quorum. Right now I don't have a clear answer because I am trying to work on serializability and transaction part of problem. But exposing some sort of HTTP interface on top of Marmot can be one possible option.
Something to consider on the subject of distributed writes, eventual consistency, etc. might be the approach taken by CouchDB. It does require the addition of a '_rev' column in their model, but I think users who felt like they needed that extra level might be able to opt-in without making it a requirement of Marmot, just an option. See https://guide.couchdb.org/draft/consistency.html#locking
Marmot can undo a locally committed write if it conflicts with a write that was already replicated to the other nodes.
An application might prefer to only respond - possibly only for some of it's writes - once the write has been successfully replicated. For that Marmot would need to provide some mechanism for the application to detect when the write in a particular transaction has been successfully replicated/rejected.
I imagine the application choosing to do that would have to be prepared to handle at least the following cases: