pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.28k stars 5.84k forks source link

Consider disabling --checksum for BR backup #43007

Open kennytm opened 1 year ago

kennytm commented 1 year ago

Feature Request

Is your feature request related to a problem? Please describe:

Currently BR backup performs checksum twice, once during SST generation in TiKV, and once again after a table is backed up using TiKV Coprocessor. These two checksums are compared to verify correctness.

However, these checksums are calculated from the exact same source: the online TiKV database. The actual uploaded SST file could be corrupt but this won't be detected by the checksum.

The only way they differ is if the RocksDB snapshot mechanism break down or the checksum algorithm is changed. These are not something the users really care about. That is to say, running with --checksum=1 only wasted CPU on TiKV to prove something that can never be false.

Describe the feature you'd like:

  1. Given that TiKV always recorded the checksum into backupmeta regardless of the --checksum flag, we can safely change the flag's default value to false.
  2. Enhance the br debug checksum command to actually read the SST files to calculate that file's CRC64XOR etc. Currently it only compares the SHA256 hash.

In the documentation recommend users to run br debug checksum after performing br backup to ensure the integrity of the uploaded backup archive.

We are only talking about BR backup here. The default --checksum setting for BR restore should still and always be true.

Describe alternatives you've considered:

A. Change the --checksum flag to mean actually performing br debug checksum. Note that br debug checksum will need to download the SST files again so it will double the I/O cost of the target cloud storage.

B. Don't calculate CRC64XOR, keeping the current implementation of br debug checksum. Checking SHA256 should be sufficient to ensure the file content is intact.

Teachability, Documentation, Adoption, Migration Strategy:

For the public API since we are only changing the default value of a flag this is fully compatible.

The br debug command is currently hidden and br debug checksum doubly so. They need to be revealed and documented.

YuJuncen commented 2 months ago

The only way they differ is if the RocksDB snapshot mechanism break down or the checksum algorithm is changed. These are not something the users really care about. That is to say, running with --checksum=1 only wasted CPU on TiKV to prove something that can never be false.

Also notice that the client implementation may make them differ. Say, some regions were omitted during backup.

(BR uses a range tree and batched style to query TiKV, while checksumming queries TiKV region by region).

kennytm commented 2 months ago

@YuJuncen

Also notice that the client implementation may make them differ. Say, some regions were omitted during backup.

Well if the local checksum and remote checksum are operating on different set of ranges, it is an implementation bug of BR which again the user cannot control besides upgrading BR to a fixed version.

3pointer commented 1 month ago

In the early design, checksum were used to verify:

In practice, backup checksum mismatches have rarely occurred. To my recollection, there were only a few times issues were caused by physical imports during the backup.

Therefore, we plan to gradually remove this logic and disable the checksum after the table is backed up by default. we already done it on Cloud. I think we can remove it on Premise for next release.

kennytm commented 1 month ago

Currently (v8.1) BR backup will only write the schema checksum ("TXNKV_CHECKSUM_ORIGIN") when we enable backup --checksum=1, causing restore to ignore the E2E checksum validation even if we restore --checksum=1.

Will we change the restore checksum to consider the sum of file checksum ("TXNKV_CHECKSUM_SST") when the schema checksum is missing from the backupmeta? The two should be equivalent.