Closed shlomi-noach closed 11 months ago
Addressed by https://github.com/vitessio/vitess/pull/13555 with a series of endtoend
tests that reproduce the error scenario (but of course now pass given the fix in the PR). Also a bunch of unit tests. In general the entire fix is one line.
Overview of the Issue
In a cluster with multiple
REPLICA
/RDONLY
tablets, it's possible to create a situation wherevtctlclient -- Backup --incremental_from_pos=auto
fails to take the backup.This gist of the scenario is if one of the tablets is restored from backup (which wipes out its binary logs, setting
gtid_purged
), takes incremental backup (runs fine), and then an attempt is made to take incremental backup on the other tablet.Reproduction Steps
Use
examples/local
. Assume:PRIMARY
tablet iszone1-0000000101
REPLICA
iszone1-0000000100
RDONLY
iszone1-0000000102
Run the following sequence. Note that the interleaved
ApplySchema
commands are there just to generate sufficient changelog in between the operations.The last
--incremental_from_pos=auto zone1-0000000100
commands yields with something similar to:The last successful incremental backup on
102
is:The issue is we do not calculate
gtid_purged
correctly.Binary Version
Operating System and Environment details
Log Fragments
No response