Closed jeremycole closed 1 year ago
The bug in KeyRangeAdd
is reproducible with the following simple test case which fails on the current main
:
func TestKeyRangeAdd1(t *testing.T) {
keyRange, ok := KeyRangeAdd(stringToKeyRange("000280-000300"), stringToKeyRange("0003-"))
assert.Equal(t, stringToKeyRange("000280-"), keyRange)
assert.Equal(t, true, ok)
}
Overview of the Issue
Keyspaces with adjacent shard definitions that use a differing number of bytes to represent the keyspace IDs in that shard do not initialize their tablets correctly, resulting in non-serving tablets.
Reproduction Steps
We created a brand new keyspace with the following shards:
-0001
000100-000180
000180-000200
000200-000280
000280-000300
0003-
All shards came up correctly except for
000280-000300
, whose tablet started up but did not mark itself asIsPrimaryServing
and thus there was no serving tablet for the shard. We were able to manually get past this by marking the tablet as serving explicitly:This was due to an error in
key.KeyRangeAdd
, used bytopotools.ValidateForReshard
viacombineKeyRanges
here:https://github.com/vitessio/vitess/blob/47611bca3951ecdf442dda5c8fc12f4eb9cff29c/go/vt/topotools/split.go#L67
https://github.com/vitessio/vitess/blob/47611bca3951ecdf442dda5c8fc12f4eb9cff29c/go/vt/key/key.go#L125-L136
The
KeyRangeAdd
function uses (c.f.)bytes.Equal(first.End, second.Start)
to compare theEnd
andStart
values without normalizing them in any way, causing the comparison ofEnd
value of[]byte{0x00, 0x03, 0x00}
andStart
value of[]byte{0x00, 0x03}
for the corresponding shards to mismatch, causing theKeyRangeAdd
to returnnil, false
instead of the properly combined range, thus causing validation of the shard topology viaValidateForReshard
to fail.Binary Version
Operating System and Environment details
Log Fragments
No response