Bug Report: incremental backup & restore: failure to take incremental backups in a multi tablet scenario

Overview of the Issue

In a cluster with multiple REPLICA/RDONLY tablets, it's possible to create a situation where vtctlclient -- Backup --incremental_from_pos=auto fails to take the backup.

This gist of the scenario is if one of the tablets is restored from backup (which wipes out its binary logs, setting gtid_purged), takes incremental backup (runs fine), and then an attempt is made to take incremental backup on the other tablet.

Reproduction Steps

Use examples/local. Assume:

PRIMARY tablet is zone1-0000000101
REPLICA is zone1-0000000100
RDONLY is zone1-0000000102

Run the following sequence. Note that the interleaved ApplySchema commands are there just to generate sufficient changelog in between the operations.

vtctlclient -- Backup zone1-0000000102
vtctldclient ApplySchema --ddl-strategy="vitess" --sql "alter table corder force" commerce && sleep 2
vtctlclient -- Backup --incremental_from_pos=auto zone1-0000000102
vtctldclient ApplySchema --ddl-strategy="vitess" --sql "alter table corder force" commerce && sleep 2
vtctldclient RestoreFromBackup zone1-0000000102
vtctldclient ApplySchema --ddl-strategy="vitess" --sql "alter table corder force" commerce && sleep 2
vtctlclient -- Backup --incremental_from_pos=auto zone1-0000000102
vtctldclient ApplySchema --ddl-strategy="vitess" --sql "alter table corder force" commerce && sleep 2
vtctlclient -- Backup --incremental_from_pos=auto zone1-0000000100

The last --incremental_from_pos=auto zone1-0000000100 commands yields with something similar to:

I0717 07:47:43.728526 2090851 main.go:96] I0717 07:47:43.728145 backup.go:110] I0717 07:47:43.727878 builtinbackupengine.go:202] Executing Backup at 2023-07-17 07:47:43.727768003 +0000 UTC m=+217.129511829 for keyspace/shard commerce/0 on tablet zone1-0000000100, concurrency: 4, compress: true, incrementalFromPos: auto
I0717 07:47:43.741621 2090851 main.go:96] I0717 07:47:43.741426 backup.go:110] I0717 07:47:43.741189 builtinbackupengine.go:260] auto evaluating incremental_from_pos
I0717 07:47:43.742018 2090851 main.go:96] I0717 07:47:43.741901 backup.go:110] I0717 07:47:43.741720 builtinbackupengine.go:279] auto evaluated incremental_from_pos: MySQL56/b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571
E0717 07:47:43.765510 2090851 main.go:96] E0717 07:47:43.765311 backup.go:110] E0717 07:47:43.765064 backup.go:163] backup is not usable, aborting it: [Code: FAILED_PRECONDITION
Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269

cannot get binary logs to backup in incremental backup]
Backup Error: rpc error: code = Unknown desc = TabletManager.Backup on zone1-0000000100 error: cannot get binary logs to backup in incremental backup: Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269: cannot get binary logs to backup in incremental backup: Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269
E0717 07:47:43.790505 2090851 main.go:105] remote error: rpc error: code = Unknown desc = TabletManager.Backup on zone1-0000000100 error: cannot get binary logs to backup in incremental backup: Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269: cannot get binary logs to backup in incremental backup: Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269

The last successful incremental backup on 102 is:

{
  "BackupMethod": "builtin",
  "Position": "MySQL56/b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571",
  "PurgedPosition": "MySQL56/b696e26a-2475-11ee-9d38-0a43f95f28a3:1-562",
  "FromPosition": "MySQL56/b696e26a-2475-11ee-9d38-0a43f95f28a3:1-562",
  "Incremental": true,
  "BackupTime": "2023-07-17T07:47:43Z",
  "FinishedTime": "2023-07-17T07:47:43Z",
  "ServerUUID": "34bb1d4c-2476-11ee-85a9-0a43f95f28a3",
  "TabletAlias": "zone1-0000000102",
  "Keyspace": "commerce",
  "Shard": "0",
  "MySQLVersion": "/home/shlomi/opt/mysql/8.0.23/bin/mysqld  Ver 8.0.23 for Linux on x86_64 (Source distribution)\n",
  "UpgradeSafe": false,
  "CompressionEngine": "pargzip",
  "FileEntries": [
    {
      "Base": "BinLog",
      "Name": "vt-0000000102-bin.000001",
      "Hash": "4925e8df",
      "ParentPath": ""
    }
  ],
  "SkipCompress": false,
  "ExternalDecompressor": ""
}

The issue is we do not calculate gtid_purged correctly.

Binary Version

v17, v18

Operating System and Environment details

Log Fragments

No response

vitessio / vitess