Open leoxlin opened 5 years ago
This is intended behavior. It was actually specifically requested by hubspot :). DBA connections were introduced because they were deemed too critical to be killed.
We can change this behavior. But it will defeat the reason why this feature was done.
@acharis is aware of this.
Reflecting conversation on slack here. The behavior is expected. Our main pain point was that the master is Demoted while waiting for the tx_pool to drain which causes a prolonged period of downtime on the shard until the time bomb fails the reparent.
The other issue is the leak problem which make this quite in-actionable if we don't know what transaction it is.
We are looking to fix this by
Background
At HubSpot we use dba workload on Vitess for migrations.
We discovered that sometimes certain shards will hang while doing a planned reparent and eventually fail.
Debugging
We were able to isolate the hanging behavior to a draining tx_pool on vttablet
Issues
Reproducing
With any vtgate client, set workload in execution option to DBA then issue a Begin. Then kill your client process without Rollback/Commit. Reparent the shard you issued the transaction against.