pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.2k stars 490 forks source link

Fail restore if warmup fails only during `check-wal-only` strategy #5621

Closed michaelmdeng closed 2 months ago

michaelmdeng commented 2 months ago

What problem does this PR solve?

5569 updates volume-snapshot restore process to fail the entire restore if any warmup job failed. We use this to quickly check the viability of restores and terminate restore processing early if a corruption is detected.

5572 updates volume-snapshot restore process to enable recovery from a corruption in a single TiKV through manual cluster operations. We use this in a full restore in case we encounter a corruption during this process.

These features are in conflict w/ each other. If we want to perform a full restore and use single TiKV recovery in the event of corruption, we cannot fail the restore during warmup and instead need to complete warmup stage and progress to restarting TiKVs. If we only want to check the viability of a restore, we are ok w/ failing the restore and not progressing to any further steps. Thus, we gate this failure behavior only behind the check-wal-only strategy.

What is changed and how does it work?

Gate restore failure on warmup failure only for check-wal-only warmup strategy.

Code changes

Tests

Side effects

Related changes

Release Notes

Please refer to Release Notes Language Style Guide before writing the release note.

sre-bot commented 2 months ago

CLA assistant check
All committers have signed the CLA.

ti-chi-bot[bot] commented 2 months ago

@YuJuncen: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to [this](https://github.com/pingcap/tidb-operator/pull/5621#pullrequestreview-2019011664): >Rest LGTM Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
BornChanger commented 2 months ago

/test-pull-e2e-kind-br

ti-chi-bot[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BornChanger, YuJuncen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/pingcap/tidb-operator/blob/master/OWNERS)~~ [BornChanger] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
ti-chi-bot[bot] commented 2 months ago

[LGTM Timeline notifier]

Timeline:

ti-chi-bot[bot] commented 2 months ago

New changes are detected. LGTM label has been removed.

csuzhangxc commented 2 months ago

/cherry-pick release-1.5

ti-chi-bot commented 2 months ago

@csuzhangxc: new pull request created to branch release-1.5: #5636.

In response to [this](https://github.com/pingcap/tidb-operator/pull/5621#issuecomment-2076983563): >/cherry-pick release-1.5 Instructions for interacting with me using PR comments are available [here](https://prow.tidb.net/command-help). If you have questions or suggestions related to my behavior, please file an issue against the [ti-community-infra/tichi](https://github.com/ti-community-infra/tichi/issues/new?title=Prow%20issue:) repository.