Closed aquarapid closed 6 months ago
It was pointed out that these binlog steams will die off as soon as the server tries to send an event down them (e.g. there is an update/insert/delete to the upstream MySQL instance). So if the underlying MySQL instance is receiving regular updates, you might not notice this at all.
After some experimentation, I found that AWS Aurora still "leaks" binlog streams even if you reuse the server_id. So, unless the upstream instance is actively receiving writes, you could eventually exhaust all the db connections. It looks like the only approach that is guaranteed to work is that followed by the binlog streamer in go-mysql, which is to open a separate connection and kill the binlog dump session using KILL xxx
It looks like the only approach that is guaranteed to work is that followed by the binlog streamer in go-mysql, which is to open a separate connection and kill the binlog dump session using KILL xxx
How does a real mysql replica tell the server that it's going away? Or does it not, and this orphan problem exists for real replicas too when no transactions are flowing?
It re-uses the server_id. Things in AWS Aurora seem to work differently; because I suspect under the covers they're using a completely different binlog "server" implementation.
I'm going to close this for now as SO much in this area has changed in the last 3.5 years and I don't see a test case here to know if it's still an issue.
If you believe that this is still an issue then please provide additional details and we can reopen the issue at any time. Thanks!
On current master (d6c9ddf71cefdaa13d0e57fb1353e347523dd713).
Scenario:
Setup local examples with unsharded
commerce
andcustomer
keyspaces:commerce.corder
:Start a
MoveTables
workflow to copycommerce.corder
tocustomer.corder
:You can validate the workflow is running (should complete immediately, since there is so little data), by running:
Now, we want to run VDiff on this flow. But first log into the REPLICA tablet in
commerce
keyspace that VDiff will be running against, and do ashow full processlist;
:Now run the VDiff:
Now, wait a few seconds, and run the VDiff again, and then again and again. After running the VDiff a bunch of times, do
show full processlist;
again. You will notice that the number of binlog streaming connections are increasing, and not being closed. They will, in fact, only time out when the MySQLwait_timeout
is reached (default, 8 hours). An example after a large number VDiff runs could be:If you continue running VDiffs, you will eventually hit the max connections for the MySQL server...
If you consult the vttablet log for the source vttablet (tablet
zone1-0000000101
in the examples case), you will see something like this on every VDiff run: