Open lizztheblizz opened 11 months ago
We've run into this issue as well. It caused reverse VReplication to break when we ran SwitchTraffic
during a reshard from 1 to 4 shards because the application has a workflow which re-associates a unique identifier (in our case an email address) from one record to another. Slack thread here: https://vitess.slack.com/archives/C0PQY0PTK/p1689620260468139
unique_checks=0
does not mean we can have rows which break the uniqueness constraint: https://bugs.mysql.com/bug.php?id=44180
Not seeing any good Vitess option to fixing this. An external one seems to be to drop the unique keys explicitly and add them back after a SwitchTraffic.
Likely the beginning of a broader feature request, but wanted to add a quick response here:
Is there a world in which a single failing vstream, knowing it is being managed by a broader workflow, could choose to skip an event that generates a unique constraint error like this? I believe it ought to prompt a warning in the logs, and perhaps ideally, should also generate a log entry in _vt.vreplication_log
(or maybe even in a separate, dedicated table for this), then fail the stream only after N number of events have failed consecutively for the same reason?
Tools like VDiff can help us confirm after the fact what, if any, damage was actually done by this "optimistic merging". And in the meantime, our MoveTables
workflows would become perhaps a bit less fragile.
My fear is that this would be TOO optimistic. Waiting for N days to run a VDiff only to see that there's some real drift.
We could e.g. turn an INSERT into an UPDATE and vice-versa so that we try and make the write idempotent. There's still the general issue of no total order and the good chance that at the end we aren't eventually consistent, but as noted, VDiff can help there so that you're at least aware of the differences. You can then try and manually fix things or retry the migration.
Repeating much of what was said above, for illustrative purposes ...
The problem seen here is that we don’t have global event ordering.
For illustration, let us consider
target
to one in the source keyspace. admins (team_id, email)
maps admin emails to teams. email
The App executes the following to achieve the switch:
update admins set email = null where team_id = 2
on Shard 2
update admins set email = ‘b@example.com’ where team_id = 1
on Shard 1
update admins set email = 'a@example.com' where team_id = 2
on Shard 2
On source we start with
(1, a@example.com)
(2, b@example.com)
Let's Shard 2's vplayer is lagging behind Shard 1, so
update admins set email = ‘b@example.com’ where team_id = 1
is the first of these three queries executed on the source shard. This breaks the unique key constraint, resulting in a ER_DUP_KEY
.
One approach is to delay the failing stream optimistically, so that the lagging stream has time to catch up. On getting an ER_DUP_KEY for a merge workflow (defined by 2 or more _vt.vreplication
rows with the same workflow_name
, we trigger an optimistic wait loop.
We could wait for the pos
of the other workflow to show progress before retrying, but that won’t scale if we have a lot of target shards. The easier approach might be to do a short exponential backoff starting with a small (100 ms?) wait unto say 5 seconds (or, longer since the altermative is a catastrophic failure, or, base it on vreplication lags of other streams ...)?
Of course, there will be situations where this might not work: for example, if there are lots of reassignments in a short time for a single team, in our example above. But hopefully we will catch a significant number of such issues allowing more workflows to proceed than they do now.
Overview of the Issue
When using
MoveTables
to consolidate shards, either by runningSwitchTraffic
while moving from 2 to 4 shards (where it affects the_reverse
workflow), or during the normal operation of aMoveTables
command moving from a 2-shard to a 4-shard keyspace, it is possible for vreplication to break because of unique keys, making it near impossible to recover.Scenario:
consistent_lookup_unique
-40
,40-80
,80-c0
,c0-
) as source-80
,80-
) as targetwriter(-40)
insert into table1(uid) values (25);
--> inserts row with valuesid=1, uid=25
delete from table1 where id=1;
writer(40-80)
insert into table1(uid) values (25);
--> inserts row with valuesid=2, uid=25
delete from table1 where id=2;
writer(-40)
insert into table1(uid) values (25);
--> inserts row with valuesid=3, uid=25
delete from table1 where id=3;
writer(40-80)
insert into table1(uid) values (25);
--> inserts row with valuesid=4, uid=25
delete from table1 where id=4;
Because the vstreams for
-40
and40-80
are executing independently from eachother, when consolidating these streams back into-80
, it is very easy for one of those vstreams to fall behind the other, and fail perpetually thereafter. The error we observe is a unique constraint error on the MySQL side.A workaround for the above scenario could be to run the vstream sessions with
SET unique_checks=0;
. This should be a safe operation in this case, because we still have our secondary lookup vindex table enforcing uniqueness across shards at the source side. I'm unclear if it would be considered safe in allunique
constraint scenarios.Reproduction Steps
vschema
:Assume a sequence table exists in an unsharded keyspace to help populate id.
Binary Version
Operating System and Environment details
Log Fragments