Open frouioui opened 1 day ago
Hello reviewers! :wave: Please follow this checklist when reviewing this Pull Request.
release notes (needs details)
label if users need to know about this change.-
), and have a clear help text.Jobs
should be named in order to mark it as required
.required
, the maintainer team must be notified._vt
tables and RPCs need to be backward compatible.vtctl
command output order should be stable and awk
-able.Attention: Patch coverage is 60.00000%
with 12 lines
in your changes missing coverage. Please review.
Project coverage is 69.45%. Comparing base (
2e2b223
) to head (9479e8b
). Report is 3 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
go/vt/mysqlctl/builtinbackupengine.go | 60.00% | 12 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Description
This PR fixes https://github.com/vitessio/vitess/issues/16855. Since each file is restored concurrently, we need to cancel all the restore at once as soon as a goroutine fails. This way we prevent the restore process to take forever and stall.
I do not have a programatic E2E test for this PR as it was way to flaky even locally. However, I have described a step-by-step reproduction process on #16855.
This is a bug all the way from 18 to main, backporting to all branches.
Logs
Before
On the first line, we detect the error. But we continue restoring all the ongoing file after that, without canceling them.
After
Here we can see that current/ongoing restore are getting canceled. Besides file 72, which I think is due to concurrency, we log the error (first line) from the stderr of
ztsd
but there is enough time until we cancel all the context for72
to begin and finish. But once we detect the error at the top level loop inrestoreFiles
, all executions are canceled.