Open arthurschreiber opened 1 year ago
Have you explored using transactional reads? We do support that.
I'm not sure this would solve the problem we're facing in a satisfactory way. I might be wrong on this, so here's what I think the problems with transaction reads would be.
First, it's unclear what parts of our codebase need to be wrapped in transactions, and with a codebase as large as ours, it'd be a herculean effort to figure that out.
We could wrap every web request that comes in with transaction, but that will lead to gravely reducing the multiplexing done on the vttablet
. We'd hit the vttablet
connection limits that we've set very quickly, as connections would be "locked" for the duration of that web request. This would be made even worse with scatter or other multi shard queries.
It's also hard for me to estimate what the change in semantics would mean for our application, and which isolation level would be the correct one to use.
Overall, it feels like a much heavier solution to this problem than what we would require. 🤔
vtgate
performs random tablet selection (with preference for local cell tablets) when deciding which tablet to use for serving a query. On larger MySQL setups, where there might be a high variance in replication lag between replica nodes, this behaviour can lead to data inconsistencies that is hard to understand and work around.Here's an example:
check_suites
andcheck_runs
.check_suites
row has many associatedcheck_runs
rows.check_runs
can only be created after creating the parentcheck_suites
row.check_suites
.Based on this description, an application that runs against a "regular" MySQL primary/replica setup, where connections are load balanced using e.g.
haproxy
can rely on the fact that it can read fromcheck_runs
, and every row will have a reference to acheck_suites
row that is guaranteed to exist.The application will have a "time consistent" view of the data, where a read query will always see data at the same point of a previous query, or a later point in time.
When sending requests through
vtgate
, that expectation is broken if queries are distributed across more than one replica. 😔The first query that is reading from
check_runs
might hit replicaA
, which has almost no replication lag. The second query, reading fromcheck_suites
, might hit replicaB
, which has over a second of replication lag. This can lead to the second query not being able to find any matching row in thecheck_suites
table, which in turn can lead to errors or other weird behaviour in our application.The application no longer has a "time consistent" view of the data, instead read queries see data at newer and older points in time.