pingcap / dumpling

Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).
Apache License 2.0
281 stars 85 forks source link

dump: add split big tidb query to several small queries #257

Closed lichunzhu closed 3 years ago

lichunzhu commented 3 years ago

What problem does this PR solve?

Although we can make sure TiDB won't OOM through the help of TiDB tablesample. But if we didn't specify --rows to open tablesample, TiDB will still OOM.

What is changed and how it works?

implement solution 2 in https://github.com/pingcap/dumpling/issues/232 Try to split the whole dump table SQL to several small queries to concatenate their results.

Check List

Tests

Side effects

Related changes

Release note

lichunzhu commented 3 years ago

Dump huge table test

Dump 1150.50GB clustered index data without -r argument dumpling:v5.0.0-rc,mydumper will OOM. dumpling:70a709a cost 7h12m34s, avg speed: 45.39 MB/s

Comparison test result

1844.61MB data from a TiDB:v5.0.0-nightly, cluster indexed table. dump the data without -r argument binary cost time speed
dumpling:70a709a 39.00s 47.29 MB/s
dumpling:v5.0.0-rc 34.49s 53.49 MB/s
mydumper 33.28s 55.42 MB/s

dumpling:70a709a has some speed regression. Mainly because it opens and closes more mysql connections than other binaries.

lance6716 commented 3 years ago

/lgtm

ti-chi-bot commented 3 years ago

[REVIEW NOTIFICATION]

This pull request has been approved by:

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment. After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by writing `/lgtm` in a comment. Reviewer can cancel approval by writing `/lgtm cancel` in a comment.
lichunzhu commented 3 years ago

/merge

ti-chi-bot commented 3 years ago

This pull request has been accepted and is ready to merge.

Commit hash: 4f22704fdcb20a6ee0bca87ea04d1b3914d24600

lichunzhu commented 3 years ago

/cherrypick release-5.0

ti-chi-bot commented 3 years ago

@lichunzhu: once the present PR merges, I will cherry-pick it on top of release-5.0 in a new PR and assign it to you.

In response to [this](https://github.com/pingcap/dumpling/pull/257#issuecomment-811873014): >/cherrypick release-5.0 > > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
ti-chi-bot commented 3 years ago

@lichunzhu: #257 failed to apply on top of branch "release-5.0":

Applying: add split big tidb query to several small queries
Using index info to reconstruct a base tree...
M   v4/export/dump.go
M   v4/export/sql.go
Falling back to patching base and 3-way merge...
Auto-merging v4/export/sql.go
Auto-merging v4/export/dump.go
CONFLICT (content): Merge conflict in v4/export/dump.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 add split big tidb query to several small queries
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
In response to [this](https://github.com/pingcap/dumpling/pull/257#issuecomment-811873014): >/cherrypick release-5.0 > > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
ti-chi-bot commented 3 years ago

@lichunzhu: new issue created for failed cherrypick: #263

In response to [this](https://github.com/pingcap/dumpling/pull/257#issuecomment-811873014): >/cherrypick release-5.0 > > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.