pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.26k stars 5.84k forks source link

br/stream: The File Tree of Log Backup #30591

Open YuJuncen opened 2 years ago

YuJuncen commented 2 years ago

File Tree of Log Backup

Design

Files and Naming

There should be 2 types of files: MetaFiles and LogFiles, MetaFiles contain the metadata of a set of LogFiles, in another way, they acting as the index of the LogFiles.

{Prefix} contains the version of the backup directory structure. (e.g. /v1).

The MetaFiles should be saved at path {Prefix}/backupmeta/{MinReslovedTSOfFiles:0??}{StoreID:06}{UUIDv4:32}.meta . Where the Prefix is the user-defined external storage path.

For speeding up restore a subset of tables, the LogFiles would be stored at {Prefix}/t{TableID:06}/{MinTSOfFile:0??}{StoreID:06}.log.

Specially, for change log of schema info (m prefixed keys), we store it at {Prefix}/m/{MinTSOfFile:0??}{StoreID:06}.log.

Another way:

Make a file named SEQUENCE in each {StoreID} directory of external storage, which's content is a 8-byte BE number. And we increase the content of the file each time we want to save some file.

Then the MetaFile name can simply be {ResolvedTSOfFiles:0??}{SequenceNumber:010}.meta(Resolved TS is only for speeding up restore: we can find needed MetaFiles easier), and the name of LogFiles can simply be {SequenceNumber:010}.log.

The allocation could be batched, hence we involved 2 read + 2 writes more for each flush(we can also only create one SEQUENCE file in the metadata directory, which can reduce the cost to 1 read + 1 write). No lock is needed, because the sequence number is shraded by the store ID.

"Routing"

When an event (aka RaftRequest) was observed by TiKV, it would be routing by a chain of routers, to the file the event should be store: each of the router defines a part of the final file path.

  1. "Task": route the file by its key, and find which task it belongs to. (Assuming any overlapping range of tasks are denied.) -- This defined the {Prefix} part.
  2. "Table": decode the key and check which table it belongs to. This defined the {TableID} part.
  3. "Region": checking the region of the key, this defined the {RegionID} part.

The terminus of the routing is a local temporary file (No flush required). The file would be copied to external storage at the "Flush" then.

The MinTSOfFile part can be stored in memory by the last chain of the routers, or trivially TS of the first key observed.

"Flush"

Each store maintains a set of local files at a temporary directory, once the total size of them exceeds 128MB or 5 mins passed, the store would make a "flush" to the external storage.

  1. It generates the MetaFile of the temporary files.
  2. It copies every LogFile from the temporary directory to the corresponding path of external storage.
  3. It copies the metadata file to the external storage.
  4. It updates the per-store NextBackupTS to the MetaStore(etcd in general), as a cache for querying the progress.

Once the metadata file get uploaded, the "flush" is finished, updating NextBackupTS in MetaStore is optional.

File Format

Each metafile is generated by a 'flush,' contains the metadata of all files involved by this 'flush'.

The content of MetaFiles is encoded as protocol buffer, with the definition:

message Metadata {
    repeated DataFileInfo files = 1;
    int64 store_id = 2;
    uint64 resolved_ts = 3;   
}

enum FileType {
    Delete = 0;
    Put = 1;
}

message DataFileInfo {
    // SHA256 of the file.
    bytes sha_256 = 1;
    // Path of the file.
    string path = 2;
    int64 number_of_entries = 3;

    /// Below are extra information of the file, for better filtering files.
    // The min ts of the keys in the file.
    uint64 min_ts = 4;
    // The max ts of the keys in the file.
    uint64 max_ts = 5;
    // The resolved ts of the region when saving the file.
    uint64 resolved_ts = 6;
    // The region of the file.
    int64 region_id = 7;
    // The key range of the file.
    // Encoded and starts with 'z'(internal key).
    bytes start_key = 8;
    bytes end_key = 9;
    // The column family of the file.
    string cf = 10;
    // The operation type of the file.
    FileType type = 11;

    // Whether the data file contains meta keys(m prefixed keys) only.
    bool is_meta = 12;
    // The table ID of the file contains, when `is_meta` is true, would be ignored.
    int64 table_id = 13;

    // It may support encrypting at future.
    reserved "iv";
}

The LogFiles should be encoded as a plain stream of key-value pairs. The format would be like:

image
YuJuncen commented 2 years ago

For now, I would prefer the sequence method for naming...(Even we involved the NextBackupTS and unix timestamp for naming the metafile, there still be chance for name conflicting, e.g. when the time is drifting, maybe monotonic clock can help?) 🤔 cc @kennytm @3pointer @joccau