When there are pre-populated tablet controls on the target keyspace, MoveTables SwitchTraffic will break with an error that requires manual cleanup before reads and writes can resume. This occurs, when the TabletControls has a list of denied tables rules that don't match the currently running workflow. If the workflow's tables don't match the TabletControls 1 for 1; then an error results.
Any traffic sent after this point will result in continued errors from the application until we removed the TabletControls and Refreshed the Shard State.
Related Issue: #13998
Reproduction Steps
Do a MoveTables with 6 sbtest databases; SwitchTraffic, ReverseTraffic; then cancel the workflow. This will result in an environment with Tablet Controls in place on the target and no running workflow.
Add two new sbtest tables on your source; and start up a new workflow; NOTE when you see the matching tables you'll see tables sbtest1-8; however, the tablet controls are only for sbtest1-6.
$ vtctlclient --server :15999 MoveTables SwitchTraffic fane_import_sharded.import-shard-80
E0915 22:10:10.097662 696 main.go:96] E0915 22:10:10.097104 traffic_switcher.go:625] allowTargetWrites failed: Code: INVALID_ARGUMENT
cannot remove tables since one or more do not exist in the denylist
E0915 22:10:10.114269 696 main.go:96] E0915 22:10:10.113676 vtctl.go:2215]
cannot remove tables since one or more do not exist in the denylist
The following vreplication streams exist for workflow fane_import_sharded.import-shard-80:
id=6 on -80/aws_useast1a_6-3337899395: Status: Stopped. VStream Lag: 0s.
MoveTables Error: rpc error: code = Unknown desc = cannot remove tables since one or more do not exist in the denylist
E0915 22:10:10.216399 696 main.go:105] remote error: rpc error: code = Unknown desc = cannot remove tables since one or more do not exist in the denylist
Any writes done to the keyspace from the application during this time results in an error:
$ sysbench --db-driver=mysql --threads=1 --events=0 --time=0 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=fane_import_sharded /usr/share/sysbench/oltp_insert.lua --tables=5 run
WARNING: Both event and time limits are disabled, running an endless test
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Initializing worker threads...
Threads started!
FATAL: mysql_drv_query() returned error 1105 (target: fane_import_sharded_source.-80.primary: vttablet: rpc error: code = FailedPrecondition desc = disallowed due to rule: enforce denied tables (CallerID: admin)) for query 'INSERT INTO sbtest4 (id, k, c, pad) VALUES (0, 4098, '09169823527-14773847787-63328771402-43563606289-98835554319-17838113855-09276254645-46412092895-40264640011-92712584350', '67793249909-86081288100-12979568721-26815841297-77951231372')'
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_insert.lua:61: SQL error, errno = 1105, state = 'HY000': target: fane_import_sharded_source.-80.primary: vttablet: rpc error: code = FailedPrecondition desc = disallowed due to rule: enforce denied tables (CallerID: admin)
Recovery Steps
(recovery step) The way to recovery here is to remove the tablet controls and refresh the shard state on the SOURCE:
(recovery step) Now any writes from the application will continue to run.
$ sysbench --db-driver=mysql --threads=1 --events=0 --time=0 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=fane_import_sharded /usr/share/sysbench/oltp_insert.lua --tables=5 run
WARNING: Both event and time limits are disabled, running an endless test
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Initializing worker threads...
Threads started!
Overview of the Issue
When there are pre-populated tablet controls on the target keyspace, MoveTables SwitchTraffic will break with an error that requires manual cleanup before reads and writes can resume. This occurs, when the TabletControls has a list of denied tables rules that don't match the currently running workflow. If the workflow's tables don't match the TabletControls 1 for 1; then an error results.
Any traffic sent after this point will result in continued errors from the application until we removed the TabletControls and Refreshed the Shard State.
Related Issue: #13998
Reproduction Steps
See Issue: #13998
Recovery Steps
Binary Version
Operating System and Environment details
Log Fragments