pingcap / tiflash

The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
https://docs.pingcap.com/tidb/stable/tiflash-overview
Apache License 2.0
946 stars 409 forks source link

Potential data loss after changing the number of TiFlash replicas #9438

Closed JaySon-Huang closed 1 month ago

JaySon-Huang commented 1 month ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. Create a table with tiflash replica and load some data
  2. Alter tiflash replica to 0
  3. Wait gc_lifetime and set tiflash replica to 1

2. What did you expect to see? (Required)

The TiFlash replica is built correctly

3. What did you see instead (Required)

There is a chance that data loss happened after step 3

4. What is your TiFlash version? (Required)

v8.1.1

JaySon-Huang commented 1 month ago

In v8.1.0 and v8.1.1, if the tiflash replica num is set to 0, applyDropTable(database_id, table_id, "SetTiFlashReplica-0") will be executed and add a tombstone_ts to the IStorage instance. https://github.com/pingcap/tiflash/blob/v8.1.1/dbms/src/TiDB/Schema/SchemaBuilder.cpp#L392-L407

If all the regions are removed from the tiflash instance, and the tombstone_ts exceeds the gc_safepoint, then we will generate a InterpreterDropQuery to physically drop the IStorage instance. https://github.com/pingcap/tiflash/blob/v8.1.1/dbms/src/TiDB/Schema/SchemaSyncService.cpp#L304-L354

However, there could be a chance that data loss due to a concurrent issue:

  1. In SchemaSyncService::gcImpl, a table is judge as both "tombstone_ts exceed the gc_safepoint" and "no region peer exists". So InterpreterDropQuery is generated
  2. User set tiflash replica to be K where K > 0, and new region snapshot is applied to tiflash before InterpreterDropQuery get executed.
  3. InterpreterDropQuery get executed, and all the data in the StorageDeltaMerge get physically removed. But the region is still exist in the raft-layer. And the query result after that will meet data loss.

Note: The mechanism of "if the tiflash replica num is set to 0, applyDropTable(database_id, table_id, "SetTiFlashReplica-0") will be executed" is trying to remove the empty segment and .sql schema file from tiflash instance after the tiflash replica num is set to 0. But seems there is no easy way to fix such a concurrent issue.

JaySon-Huang commented 1 month ago

For LTS, only v8.1.0 and v8.1.1 are affected.

release-7.5 is not affected because we comment out the "drop table" action when alter tiflash replica to 0 https://github.com/pingcap/tiflash/blob/release-7.5/dbms/src/TiDB/Schema/SchemaBuilder.cpp#L391-L408

Older versions are also not affected because there is no such mechanism.