pingcap / community

TiDB community content
Apache License 2.0
260 stars 151 forks source link

Incubating Program: make PITR production-ready #126

Open WangXiangUSTC opened 4 years ago

WangXiangUSTC commented 4 years ago

Introduce

github: https://github.com/lvleiice/Better-PITR

PITR is an ecosystem tool for TiDB Binlog. By preprocessing the incremental backup file of TiDB, PITR merged the changes of the same line of data to produce a new, lighter incremental backup file, which greatly reduced the Time of incremental backup Recovery and realized fast-pitr (Fast Point in Time Recovery).

For example

There is a table t1, it's schema is: create table t1 (id int primary key, name varchar(24)). And now we execute four SQLs in TiDB:

insert into t1 values(1, “a”);
insert into t1 values(2, “b”);
update t1 set name = “c” where id = 1;
delete from t1 where id = 2;

These SQLs will generate four binlog, restore binlog using Reparo tool data to downstream will execute four SQLs in downstream database. These binlogs are actually can merged to generate an insert into t1 values(1, "c"); This will save a quarter as much space as before and restore the files four times as fast. We can think of it simply: the binlog file produced by Drainer is compressed/preprocessed by PITR.

Current Situation

PITR is a Hackathon project, so it only implements the basic functionality, has some known problems, and lacks testing, so there may be more unknown problems. We need to solve the below problems, and make PITR production-ready.

Bug

Performance

Test

Usability

Estimated Time

3(Developers) * 7(days)

siddontang commented 4 years ago

LGTM

winkyao commented 4 years ago

LGTM