shatteredsilicon / ssm-submodules

GNU Affero General Public License v3.0
0 stars 2 forks source link

For replication slaves, harvest binlog data and show in QAN #286

Open gordan-bobic opened 1 month ago

gordan-bobic commented 1 month ago

Sometimes interactions between reads and writes can cause locking problems, both with reads slowing down or blocking replication and with replication slowing down or blocking reads.

Extend QAN to harvest the binlog if log_bin is enabled.

Detect whether log_bin is enabled at ssm-admin add mysql:queries, and if it is, enable binlog harvesting for QAN. If it isn't enabled, disable it. This should be configurable via the .conf file.

If binlog_format = STATEMENT, we have the full statement, we can put it in as is. If binlog_format = MIXED, we'll have most events as statement, but some will be row based. If binlog_format = ROW, everything will be row based.

ROW formatted events are not going to only be convertible into pseudo-SQL on a row-by-row basis of what single-row-affecting statements would achieve the change. See the putput of following command on a row based binlog for examples:

mysqlbinlog --base64-output=decode-rows -vvvvv <binlog file>

In QAN, show replication queries (or pseudo-queries for row based events) prefixed with R:, i.e. if there was an INSERT into table foo replicated down, show those events as:

R: INSERT foo

in the Query Abstract column.

The query example should contain either SQL or pseudo-SQL, depending on the event. SQL, whichever we have for the event (some such INSERTs could be SQL, and some pseudo-SQL, in binlog_format=MIXED case. They will be different queries and should be tracked as such. In binlog_format=ROW all R:INSERT foo queries will match the same template, and should be tracked as different instances of the same query.

Thinking bout it, instead of the R prefix, maybe the replication indicator would be good to have as a separate column in the table of queries by GTT, so that it is togglable by clicking with a tri-state option, i.e. by default it is included, the other two options being to exclude it, and to ONLY include replicated queries in the displayed list.

This may require writing a new, separate parser for binlogs. mysqlbinlog could potentially be used for parsing the relay log, and it's output absorbed, but mysqlbinlog is single-threaded and can be quite slow - we may need to make performance improvements to it.

The additional difficult could be that relay logs get rotated out as soon as they are processed, so toward the end of relay log we could end up with a gap. We would also need to not process relays from the start of the file, but from the last processed event position. Options I can think of for avoiding the telemetry dropout at the end of each relay file are: 1) Live with the gap (mysqlbinlog doesn't have a tail -f type option to keep reading the file 2) tail -f -n +1 relay.log | mysqlbinlog --base64-output=decode-rows -vvvv - - in this case we need to detect when the file handle is closed (maybe set an inotify/fanotify hook?), or watch for next relay log creation to take the hint that we need to terminate and open the next relay log instead. 3) Detect when the file is rotated and fetch any potential missing rows by connecting to the master and fetching the rest of the binlog, then resuming back from the local relay. I don't particularly like this because this involves duplicating some of the network bandwidth and processing.

I'm slightly leaning toward option 2).

Additionally, binlog contains timestamps for the events. If lag is 0, then this should be reasonably accurate. But - when lag is not 0, the question is how do we display this in terms of when queries were executed? As timestamp in the log + lag at the time based on the lag we are recording? Do we log the timestamp as being the time of harvesting rather than the timestamp in the log? That may be the cleanest possible solution but it seems like a shame to discard the hard data that is the timestamp in the log in favour of the more poorly defined "harvest time" option. Do we add another column, or just stick with harvest batch time for the QAN purposes? The thing I like about harvest time logging (even though it makes it inconsistent with how we will be logging slow log queries' timestamps), is that it is simple.

The big problem is what happens when we have replication lag - for slow log queries we know when they executed. For relay log queries we don't, server could be lagging by days, but we want to see the time when the relay is applied, not when it was supposed to be applied, which gets us back to the "inject replication lag from the other source" option. This would also evade the anomaly with batch time logging at the start of the log where we might ingest a GB of data and it would all be logged as happening in that batch even though the data could have been trickling in over hours.

All this also needs to work for multi-source replication in both MySQL and MariaDB (i.e. there could be multiple sets of replay logs, with corresponding replication lag to account for..