pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.36k stars 5.85k forks source link

br: reduce memory footprint and HWM (high water mark) #51573

Open BornChanger opened 8 months ago

BornChanger commented 8 months ago

Enhancement

BR can consume large size of volume especially at SAAS scenarios due to huge number of db/tables, temporary cache etc. In fact, most of those memory is one time usage and can be allocated in smaller batches and also be released sooner. In this issue, we track improvement on this area.

Common

The GC memory limit tuner would adjust the golang GC memory limit to a value close to TiDB server environment memory instead of BR's. Besides, backup/restore is a task with a lot of temporary memory, which requires to trigger GC frequently. Therefore, PR#51082 forbidden the GC memory limit tuner in BR binary.

Make stats export/import under DXF.

Catch possible goroutine leak

Automatically adjust GOMEMLIMIT for br clp

Backup

Before v7.1.0, when the upstream cluster had a large number of wide tables, it was possible for BR to consume a lot of memory during the backup process. During a backup process, BR would keep three copies of the table information in memory:

  1. The InfoSchema maintained by the background domain.
  2. The information of the databases and tables being prepared for backup.
  3. The serialized schema information before uploading them to external storage.

PR#43003 removes the aforementioned second point of table/databse information. Instead, it adopts a traversal execution approach to promptly release the memory of information of backed up tables.

PR#47114 removes the aforementioned third point of table/databse information. It saves the schema information into some files, and the size of each file is at most 128 MB.

For the aforementioned first point, we will use BRIE via SQL on TiDB in future, and the TiDB shares the domain with BR task.

Restore

There might be a table with a large size of statistics (sometimes the table has many partitions). BR uses a lot of memory when backup/restore the table.

PR#49973 supports to dump/load statistics in partition dimension. PR#49628 supports for BR to persist/restore the statistics data in partition dimension. PR#57192 prevent preallocating too much items and uses too much memory.

Log Task

There is no need to start domain for br log operation except log restore. PR#52127 stops to start domain and creates etcd client by br itself.

BR in SQL

Put BR in SQL under the memory quota control framework

BornChanger commented 8 months ago

/component br