Open aquarapid opened 2 years ago
To solve this generally is high effort (we'd have to track each backend instance individually); as a workaround, the following has been suggested:
-queryserver-config-query-timeout
and -queryserver-config-transaction-timeout); and control the timeouts via alternate MySQL-controlled directives (
max_execution_time` or the equivalent MySQL query comment directive). This handles the case in which Vitess kills queries during normal operation.-shutdown_grace_period
to be larger than the timeouts you are setting on queries via MySQL means above. This will ensure that on shutdown, Vitess will wait long enough for these queries to time out and terminate, and will not need to actively try to kill queries.This is not ideal, but should limit the potential negative consequences of this issue.
I realized this while reading through the code:
dbaPool
alongside the main pooldbaPool
, and thenKILL
-ing the thread id for the relevant query/queries.This all works well as long as all connections are going to the same MySQL backend. However, this might not always be the case:
vttablet
at an Aurora read-only endpoint, and you have multiple reader instances, you may have db connections going to any of the backend databases, since the DNS -> IP resolution for the connections are made at the time we Connect() to the database, not whenvttablet
starts, or when the pool gets created.dbaPool
may well not be connected to the same backend as the running query you are trying to kill. So the query killing for deadlines and onvttablet
shutdown may not be effective.vttablet
just blindly issues aKILL
with the thread id; you may accidentally kill a thread with the same ID on a different instance that does not belong tovttablet
or is doing something else on behalf ofvttablet
.Fixing this isn't trivial. Some possible options:
dbaPool
. Two issues with this would be: