pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.4k stars 5.73k forks source link

RFC + Support of BACKUP and RESTORE statements #15274

Closed kennytm closed 4 years ago

kennytm commented 4 years ago

What problem does this PR solve?

Support running BR inside TiDB directly.

What is changed and how it works?

Recognize the new *ast.BRIEStmt in pingcap/parser#746, and forward to the library functions in BR. When we execute

BACKUP DATABASE `tpcc` TO 'local:///tmp/storage/';

TiDB will spawn a new BR manager which backs up the database tpcc into the provided storage. The query blocks until backup completes. Returns an empty set on success:

MySQL [tpcc]> backup database tpcc to 'local:///tmp/br_tpcc_32';
Empty set (58.453 sec)

and returns an error on failure:

MySQL [tpcc]> backup table tpcc.stock to 'local:///tmp/br_tpcc_30';
ERROR 8124 (HY000): Backup failed: backup meta exists, may be some backup files in the path already

BRIE tasks must be executed sequentially. Currently, for simplicity, tasks are queued in the local server only. In the future we make the entire cluster share the same queue.

Use SHOW BACKUP / SHOW RESTORE in another session to list the tasks

MySQL [(none)]> show backup;
+-------------------------+---------+-------------------+---------------------+---------------------+------+
| Storage                 | State   | Progress          | Init_time           | Step_start_time     | ID   |
+-------------------------+---------+-------------------+---------------------+---------------------+------+
| local:///tmp/br_tpcc_30 | Backup  | 98.38709677419355 | 2020-04-12 23:09:03 | 2020-04-12 23:09:25 |    3 |
| local:///tmp/br_tpcc_30 | Wait    |                 0 | 2020-04-12 23:09:48 | 2020-04-12 23:09:48 |    4 |
+-------------------------+---------+-------------------+---------------------+---------------------+------+

Use KILL TIDB QUERY n to cancel a task.

Note: Currently running RESTORE may make the tables enter a "non-ACID" state where the backup archives are partially ingested. Maybe we need to pessimistically lock the entire database?

Note: No test cases yet. What to do?

Check List

Tests

Code changes

Side effects

Related changes

Release note

kennytm commented 4 years ago

We have a dependency problem preventing the plugins to be run, which blocks the Required "idc-jenkins-ci-tidb/build" CI.

  1. BR imports zap 1.14.0, which also means TiDB's zap version is increased to 1.14.0 too.
  2. But the plugins still use zap 1.9.1.
  3. And thus we get the "plugin was built with a different version of package go.uber.org/multierr" error (multierr is a dependency of zap).

OTOH we can't upgrade the plugin's dependency to 1.14.0 before this PR is merged, because this would cause the version mismatch error in the other way, and blocks other PRs.

zyxbest commented 4 years ago

/build

kennytm commented 4 years ago

For record, how to run a test using TiUP @3pointer:

  1. Build TiDB

    make
  2. Install TiUP

    curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh 
  3. Install playground and bench from TiUP

    tiup install playground bench
  4. Start a playground using our own TiDB

    tiup playground nightly --db.binpath ./bin/tidb-server
  5. In another shell, populate some TiDB with some TPC-C data

    tiup bench tpcc prepare -D tpcc --warehouses 30
  6. Now we can perform backup and restore...

    $ mysql -u root -h 127.0.0.1 -P 4000
    mysql> backup database tpcc to 'local:///tmp/br_test_30';
    mysql> drop database tpcc;
    mysql> restore database tpcc from 'local:///tmp/br_test_30';
bb7133 commented 4 years ago

Good job kenny, but can you propose a RFC to docs/design firstly?

sre-bot commented 4 years ago

@SunRunAway, @wshwsh12, @francis0407, @lzmhhh123, PTAL.

3pointer commented 4 years ago

@bb7133 @lonng PTAL

sre-bot commented 4 years ago

@SunRunAway, @wshwsh12, @francis0407, @lzmhhh123, PTAL.

zz-jason commented 4 years ago

The query blocks until backup completes. Returns an empty set on success.

We have two choices:

  1. Returning a "result" immediately to the user. Informing that the task is asynchronously executed or queued. e.g, the two possible results on this choice are:
    The backup task submitted successfully, executing asynchronously.
    Execute `SHOW BACKUP` to see the progress.
The backup task submitted successfully, waiting for 3 tasks.
Execute `SHOW BACKUP` to see the progress.
  1. Blocking there until the task completed. The returned result can have more information than just empty, for example:
zz-jason commented 4 years ago

As for the result of show backup. Maybe we can rename some field to make them more understandable:

It would be nice if we can show the backup speed and file size in the result of show backup.

kennytm commented 4 years ago

@zz-jason This was once implemented using asynchronous execution, and we feel this is very confusing. So the second option sounds better.

kennytm commented 4 years ago

(Pre-RFC for discussion: https://docs.google.com/document/d/1hvwsGtZ0NG16Y1LnaU5i5ccGFXFgP58_w81g1oomDQs/edit)

sre-bot commented 4 years ago

@SunRunAway, @wshwsh12, @francis0407, @lzmhhh123, PTAL.

kennytm commented 4 years ago

Rendered RFC: https://github.com/pingcap/tidb/blob/2e8d7806f986f9aca3d237a8f18df2bf0eed3ea6/docs/design/2020-04-20-brie.md

Updated parser: pingcap/parser#824

sre-bot commented 4 years ago

@SunRunAway, @wshwsh12, @francis0407, @lzmhhh123, PTAL.

kennytm commented 4 years ago

(Blocked by pingcap/br#249 and pingcap/parser#824.)

kennytm commented 4 years ago

PTAL @zz-jason @bb7133 thanks.

(The build failure is still https://github.com/pingcap/tidb/pull/15274#issuecomment-600323624)

codecov[bot] commented 4 years ago

Codecov Report

Merging #15274 into master will increase coverage by 0.0233%. The diff coverage is n/a.

@@               Coverage Diff                @@
##             master     #15274        +/-   ##
================================================
+ Coverage   80.3013%   80.3247%   +0.0233%     
================================================
  Files           508        509         +1     
  Lines        138786     140069      +1283     
================================================
+ Hits         111447     112510      +1063     
- Misses        18425      18614       +189     
- Partials       8914       8945        +31     
XuHuaiyu commented 4 years ago

/rebuild

XuHuaiyu commented 4 years ago

/merge

sre-bot commented 4 years ago

/run-all-tests

XuHuaiyu commented 4 years ago

/rebuild

sre-bot commented 4 years ago

@kennytm merge failed.

XuHuaiyu commented 4 years ago

/run-all-tests

kennytm commented 4 years ago

/rebuild plugin=pr/33

kennytm commented 4 years ago

https://internal.pingcap.net/idc-jenkins/job/tidb_ghpr_integration_ddl_test/12589/display/redirect:

Network issues when downloading Go modules ``` [2020-04-30T08:12:17.692Z] github.com/pingcap/tidb/executor imports [2020-04-30T08:12:17.692Z] github.com/pingcap/br/pkg/storage imports [2020-04-30T08:12:17.692Z] github.com/aws/aws-sdk-go/aws: github.com/aws/aws-sdk-go@v1.26.1: unexpected EOF [2020-04-30T08:12:17.692Z] github.com/pingcap/tidb/executor imports [2020-04-30T08:12:17.692Z] github.com/pingcap/br/pkg/storage imports [2020-04-30T08:12:17.692Z] github.com/aws/aws-sdk-go/aws/awserr: github.com/aws/aws-sdk-go@v1.26.1: unexpected EOF [2020-04-30T08:12:17.692Z] github.com/pingcap/tidb/executor imports [2020-04-30T08:12:17.692Z] github.com/pingcap/br/pkg/storage imports [2020-04-30T08:12:17.692Z] github.com/aws/aws-sdk-go/aws/credentials: github.com/aws/aws-sdk-go@v1.26.1: unexpected EOF [2020-04-30T08:12:17.692Z] github.com/pingcap/tidb/executor imports [2020-04-30T08:12:17.692Z] github.com/pingcap/br/pkg/storage imports [2020-04-30T08:12:17.692Z] github.com/aws/aws-sdk-go/aws/request: github.com/aws/aws-sdk-go@v1.26.1: unexpected EOF [2020-04-30T08:12:17.692Z] github.com/pingcap/tidb/executor imports [2020-04-30T08:12:17.692Z] github.com/pingcap/br/pkg/storage imports [2020-04-30T08:12:17.692Z] github.com/aws/aws-sdk-go/aws/session: github.com/aws/aws-sdk-go@v1.26.1: unexpected EOF [2020-04-30T08:12:17.693Z] github.com/pingcap/tidb/executor imports [2020-04-30T08:12:17.693Z] github.com/pingcap/br/pkg/storage imports [2020-04-30T08:12:17.693Z] github.com/aws/aws-sdk-go/service/s3: github.com/aws/aws-sdk-go@v1.26.1: unexpected EOF ```
kennytm commented 4 years ago

/rebuild plugin=pr/33

lysu commented 4 years ago

/rebuild plugin=pr/34

lysu commented 4 years ago

/merge

sre-bot commented 4 years ago

/run-all-tests

sre-bot commented 4 years ago

@kennytm merge failed.

kennytm commented 4 years ago

/merge

sre-bot commented 4 years ago

/run-all-tests

sre-bot commented 4 years ago

@kennytm merge failed.

kennytm commented 4 years ago

PTAL @5kbpers, how to fix the wasm build failure?

17:06:02  # github.com/cheggaaa/pb/v3/termutil
17:06:02  ../../../../pkg/mod/github.com/cheggaaa/pb/v3@v3.0.1/termutil/term.go:23:11: undefined: lockEcho
17:06:02  ../../../../pkg/mod/github.com/cheggaaa/pb/v3@v3.0.1/termutil/term.go:39:11: undefined: unlockEcho
17:06:02  # github.com/syndtr/goleveldb/leveldb/storage
17:06:02  ../../../../pkg/mod/github.com/syndtr/goleveldb@v1.0.1-0.20190625010220-02440ea7a285/leveldb/storage/file_storage.go:107:16: undefined: newFileLock
17:06:02  ../../../../pkg/mod/github.com/syndtr/goleveldb@v1.0.1-0.20190625010220-02440ea7a285/leveldb/storage/file_storage.go:192:3: undefined: rename
17:06:02  ../../../../pkg/mod/github.com/syndtr/goleveldb@v1.0.1-0.20190625010220-02440ea7a285/leveldb/storage/file_storage.go:267:12: undefined: rename
17:06:02  ../../../../pkg/mod/github.com/syndtr/goleveldb@v1.0.1-0.20190625010220-02440ea7a285/leveldb/storage/file_storage.go:272:12: undefined: syncDir
17:06:02  ../../../../pkg/mod/github.com/syndtr/goleveldb@v1.0.1-0.20190625010220-02440ea7a285/leveldb/storage/file_storage.go:555:9: undefined: rename
17:06:02  ../../../../pkg/mod/github.com/syndtr/goleveldb@v1.0.1-0.20190625010220-02440ea7a285/leveldb/storage/file_storage.go:591:13: undefined: syncDir
17:06:02  # go.etcd.io/bbolt
17:06:02  ../../../../pkg/mod/go.etcd.io/bbolt@v1.3.3/db.go:127:13: undefined: maxMapSize
17:06:02  # go.etcd.io/etcd/pkg/fileutil
17:06:02  ../../../../pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/pkg/fileutil/lock_flock.go:29:11: undefined: syscall.Flock
17:06:02  ../../../../pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/pkg/fileutil/lock_flock.go:29:38: undefined: syscall.LOCK_EX
17:06:02  ../../../../pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/pkg/fileutil/lock_flock.go:29:54: undefined: syscall.LOCK_NB
17:06:02  ../../../../pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/pkg/fileutil/lock_flock.go:44:11: undefined: syscall.Flock
17:06:02  ../../../../pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20191023171146-3cf2f69b5738/pkg/fileutil/lock_flock.go:44:38: undefined: syscall.LOCK_EX
sre-bot commented 4 years ago

/run-all-tests

kennytm commented 4 years ago

/merge

5kbpers commented 4 years ago

/merge

sre-bot commented 4 years ago

@kennytm merge failed.

sre-bot commented 4 years ago

/run-all-tests

sre-bot commented 4 years ago

@kennytm merge failed.

sre-bot commented 4 years ago

/run-all-tests

sre-bot commented 4 years ago

@kennytm merge failed.

kennytm commented 4 years ago

"The page you are looking for is temporarily unavailable. Please try again later."

Take a rest.

kennytm commented 4 years ago

/merge

sre-bot commented 4 years ago

/run-all-tests

sre-bot commented 4 years ago

@kennytm merge failed.

kennytm commented 4 years ago

/merge

sre-bot commented 4 years ago

/run-all-tests

sre-bot commented 4 years ago

@kennytm merge failed.

kennytm commented 4 years ago

https://internal.pingcap.net/idc-jenkins/blue/rest/organizations/jenkins/pipelines/tidb_ghpr_unit_test/runs/35187/nodes/69/steps/321/log/?start=0

[2020-04-30T10:44:18.199Z] ----------------------------------------------------------------------
[2020-04-30T10:44:18.199Z] FAIL: point_get_test.go:470: testPointGetSuite.TestSelectCheckVisibility
[2020-04-30T10:44:18.199Z] 
[2020-04-30T10:44:18.199Z] point_get_test.go:491:
[2020-04-30T10:44:18.199Z]     // Test point get.
[2020-04-30T10:44:18.199Z]     checkSelectResultError("select * from t where a='1'", tikv.ErrGCTooEarly)
[2020-04-30T10:44:18.199Z] point_get_test.go:487:
[2020-04-30T10:44:18.199Z]     c.Assert(err, NotNil)
[2020-04-30T10:44:18.199Z] ... value = nil
[2020-04-30T10:44:18.199Z]