pingcap / br

A command-line tool for distributed backup and restoration of the TiDB cluster data
https://pingcap.com/docs/dev/how-to/maintain/backup-and-restore/br/
Apache License 2.0
124 stars 102 forks source link

Statistics should be saved outside of `backupmeta` #691

Closed kennytm closed 3 years ago

kennytm commented 3 years ago

Since 4.0.9, we store the stats JSON directly inside backupmeta. This works well for small clusters, but fails catastrophically when the database size and table count becomes large.

The biggest¹ issue is the CMSketch field of the stats, which consists of 5 × 2048 = 10240 integers by default, and every column and index has its own CMSketch. This means the JSON serialization occupy at least 20 KB per (column + index) in the backupmeta file. In a large cluster with thousands of tables this makes the file too big to reliably transmit through cloud storage, and also risk OOM error when fit into memory.

Therefore we need to revisit our encoding scheme of stats.

¹: pun intended

kennytm commented 3 years ago
  1. External stats should be saved as gzip-compressed JSON files containing

    {
        "version": "1",
        "stats": [ /* ... []*handle.JSONTable ... */ ]
    }
  2. The Stats field in backup.Schema is changed to an oneof containing the original member as inlined JSON stats, and a new member as the file name to the external stats.

    message Schema {
        ...
        oneof stats {
            bytes inlined = 7;
            string external = 8;
        }
    }

    When restoring, if external is filled, we read that external .json.gz file, locate the []*handle.JSONTable with the corresponding database_name & table_name, and then call LoadStatsFromJSON. Otherwise, if inlined is filled, and deserialize it into *handle.JSONTable directly. Otherwise, we follow #679.

    This scheme is both backwards- and forwards-compatible:

    • 4.0.8 backup → new restore: both fields are missing, so we follow #679 and do ANALYZE TABLE
    • 4.0.9 backup → new restore: inlined is filled, so we load the inlined JSON stats.
    • new backup → new restore: external is filled, so we load from external JSON file.
    • new backup → 4.0.9 restore: inlined (original stats in 4.0.9) is missing, so stats will not be restored
    • new backup → 4.0.8 restore: extra fields are just ignored.
  3. How the .json.gz files are populated is not yet designed. We could just have one file per table, though this will create thousands of tiny files. So we could collect "enough" tables into a single .json.gz file (which is why stats is an array). But restoring may need to keep reopening the same file, increasing the cost of restore. So we need to do caching, whiling needing to keep RAM usage low, and bam we hit one of the two Hard Things™ in Computer Science. Anyway perhaps we should just start with thousands of tiny files and optimize later.

overvenus commented 3 years ago
  1. How the .json.gz files are populated is not yet designed. We could just have one file per table, though this will create thousands of tiny files. So we could collect "enough" tables into a single .json.gz file (which is why stats is an array). But restoring may need to keep reopening the same file, increasing the cost of restore. So we need to do caching, whiling needing to keep RAM usage low, and bam we hit one of the two Hard Things™ in Computer Science. Anyway perhaps we should just start with thousands of tiny files and optimize later.

Tiny files may slow down backup, I prefer write all stat to one file. JSON format is optional, we need a file format supports append (for writing) and seek (for reading).

kennytm commented 3 years ago

@overvenus in that case we save the JSONs into a ZIP archive (synchronized between cloud and a temp dir)?

IANTHEREAL commented 3 years ago

@kennytm , can you help sort out the organizational design of the backup data, including what format is used to store different backup data, how to name the files, and how to divide the files?

Our future backup data format adjustments can be based on this design document, and everyone will consider the design information more carefully.

kennytm commented 3 years ago

it looks like just starting the stats worker is going to use 1G+ memory (#693), so there's high chance we need to directly read from the mysql.stats_* table 😞.

IANTHEREAL commented 3 years ago

I always want to ask, why design a backup mechanism for stats instead of backing up other table data in the same way?

kennytm commented 3 years ago

@IANTHEREAL there are two concerns involving system tables:

  1. there is no "compatibility guarantee" about the schemas of the mysql.* tables, so we can't guarantee say a 4.0.0 system table can be correctly restored to 4.0.11. (this can be fixed by copying the upgradeToVerXXXX functions from bootstrap against the tables before restoring).
  2. for stats specifically, we need to perform a "rewrite" of the table_ids.
IANTHEREAL commented 3 years ago

@kennytm Although I don't know the specific details, I understand the difficulties. I see that you are already designing a plan to back up system tables and statistics. I think it is a very good plan, we can try it.

kennytm commented 3 years ago

Closing in favor of https://github.com/pingcap/br/issues/679#issuecomment-762592254.