openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.44k stars 4.69k forks source link

Bug 1872726: [3.11] Upstream: 89160: Remove potentially unhealthy symlink only for dead containers #25447

Closed haircommander closed 3 years ago

haircommander commented 3 years ago

Cherry-pick of https://github.com/openshift/origin/pull/24926 addressing: https://bugzilla.redhat.com/show_bug.cgi?id=1823406#c15

As the discussion over #52172 showed, there is race condition between the container log rotation and the kubelet GC which may result in the loss of symlink.

Here is how container log rotation works (see containerLogManager#rotateLatestLog):

rename current log to rotated log file whose filename contains current timestamp (fmt.Sprintf("%s.%s", log, timestamp))
reopen the container log
if #2 fails, rename rotated log file back to container log

There is small but indeterministic amount of time during which log file doesn't exist (between steps #1 and #2, between #1 and #3). Hence the symlink may be deemed unhealthy during that period.

This PR resorts to runtimeService.ContainerStatus() to check whether the container corresponding to the potentially unhealthy symlink is alive or not. The symlink would only be removed for dead containers.

openshift-ci-robot commented 3 years ago

@haircommander: No Bugzilla bug is referenced in the title of this pull request. To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to [this](https://github.com/openshift/origin/pull/25447): >[3.11] Upstream: 89160: Remove potentially unhealthy symlink only for dead containers Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-ci-robot commented 3 years ago

@haircommander: This pull request references Bugzilla bug 1872726, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target release (3.11.z) matches configured target release for branch (3.11.z) * bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
In response to [this](https://github.com/openshift/origin/pull/25447): >Bug 1872726: [3.11] Upstream: 89160: Remove potentially unhealthy symlink only for dead containers Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
haircommander commented 3 years ago

/retest

haircommander commented 3 years ago

/retest

haircommander commented 3 years ago

/retest

haircommander commented 3 years ago

/retest

haircommander commented 3 years ago

/retest

haircommander commented 3 years ago

/retest

sjenning commented 3 years ago

/approve /lgtm /bugzilla refresh

openshift-ci-robot commented 3 years ago

@sjenning: This pull request references Bugzilla bug 1872726, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target release (3.11.z) matches configured target release for branch (3.11.z) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
In response to [this](https://github.com/openshift/origin/pull/25447#issuecomment-701479451): >/approve >/lgtm >/bugzilla refresh Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-ci-robot commented 3 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/origin/blob/release-3.11/OWNERS)~~ [sjenning] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

haircommander commented 3 years ago

/retest

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 3 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.