Closed tjungblu closed 8 months ago
@tjungblu: This pull request references Jira Issue OCPBUGS-30873, which is valid. The bug has been moved to the POST state.
Requesting review from QA contact: /cc @geliu2016
The bug has been updated to refer to the pull request using the external bug tracker.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: dusk125, tjungblu
The full list of commands accepted by this bot can be found here.
The pull request process is described here
/cherry-pick release-4.15
@tjungblu: once the present PR merges, I will cherry-pick it on top of release-4.15 in a new PR and assign it to you.
/retest-required
Remaining retests: 0 against base HEAD 463979a2bdc3e2d31ed4d94f2624ea1a2c39fb44 and 2 for PR HEAD 5e0db1bd00202e900b17a9e0f0859ad815e20f17 in total
unrelated failures
/override ci/prow/e2e-aws-ovn-serial /override ci/prow/e2e-operator-fips
@tjungblu: Overrode contexts on behalf of tjungblu: ci/prow/e2e-aws-ovn-serial, ci/prow/e2e-operator-fips
/override ci/prow/e2e-aws-ovn-single-node
@tjungblu: Overrode contexts on behalf of tjungblu: ci/prow/e2e-aws-ovn-single-node
@tjungblu: The following test failed, say /retest
to rerun all failed tests or /retest-required
to rerun all mandatory failed tests:
Test name | Commit | Details | Required | Rerun command |
---|---|---|---|---|
ci/prow/e2e-gcp-qe-no-capabilities | 5e0db1bd00202e900b17a9e0f0859ad815e20f17 | link | false | /test e2e-gcp-qe-no-capabilities |
Full PR test history. Your PR dashboard.
@tjungblu: Jira Issue OCPBUGS-30873: All pull requests linked via external trackers have merged:
Jira Issue OCPBUGS-30873 has been moved to the MODIFIED state.
@tjungblu: new pull request created: #1225
Currently we only detect whether a controller has been running continuously into errors. Whereas we wanted to detect real deadlock situations. This change defuses the aliveness check to only declare real locking situations as problematic.
Additionally, to not create insane amounts of log traffic, this change will throttle the stack dumping to once every 15 minutes. Previously it would trigger almost immediately every health probe invocation and create multi-megabyte log spam.