Closed simonpasquier closed 5 months ago
@simonpasquier: This pull request references Jira Issue OCPBUGS-32510, which is valid.
Requesting review from QA contact: /cc @juzhao
The bug has been updated to refer to the pull request using the external bug tracker.
/assign @machine424 /assign @jan--f /assign @slashpai
/retest
/payload-job-with-prs periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node https://github.com/openshift/api/pull/1878
@simonpasquier: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.
/payload-job-with-prs periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node openshift/api#1865
@machine424: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a33270c0-0bb6-11ef-9260-f04f1bc5bec1-0
/payload-job-with-prs periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node openshift/api#1865
@slashpai: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/14e95a20-0bb8-11ef-8bea-fa1b23220f91-0
@simonpasquier: The following test failed, say /retest
to rerun all failed tests or /retest-required
to rerun all mandatory failed tests:
Test name | Commit | Details | Required | Rerun command |
---|---|---|---|---|
ci/prow/versions | f9670c7cbddc3595362c6e8e175a412f3aad706d | link | false | /test versions |
Full PR test history. Your PR dashboard.
/hold
/payload-job-with-prs periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node https://github.com/openshift/api/pull/1865
@slashpai: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.
/payload-job-with-prs periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node openshift/api#1865
(even though we already got a green here https://github.com/openshift/cluster-monitoring-operator/pull/2337#issuecomment-2096201375 and no changes were pushed later. The test is failing on https://github.com/openshift/cluster-monitoring-operator/pull/2337#issuecomment-2096222403 because of unrelated etcd events)
/skip
/skip
tested with PR
launch 4.16.0-0.nightly-2024-05-07-025557,openshift/cluster-monitoring-operator#2337 aws,single-node
readinessProbe path changed from /readyz to /livez and startupProbe is added
$ oc -n openshift-monitoring get pod metrics-server-5cc4cd5f75-5nshz -oyaml
...
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: metrics-server
ports:
- containerPort: 10250
name: https
protocol: TCP
readinessProbe:
failureThreshold: 6
httpGet:
path: /livez
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 1m
memory: 40Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000450000
startupProbe:
failureThreshold: 6
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 1
/label qe-approved
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: machine424, simonpasquier
The full list of commands accepted by this bot can be found here.
The pull request process is described here
/payload-job-with-prs periodic-ci-openshift-release-master-nightly-4.16-e2e-aws-ovn-single-node openshift/api#1865
@slashpai: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5a5eb190-1117-11ef-93e5-df91624bf14d-0
/hold cancel
@simonpasquier: Jira Issue OCPBUGS-32510: Some pull requests linked via external trackers have merged:
The following pull requests linked via external trackers have not merged:
These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh
.
Jira Issue OCPBUGS-32510 has not been moved to the MODIFIED state.
[ART PR BUILD NOTIFIER]
This PR has been included in build cluster-monitoring-operator-container-v4.17.0-202405132002.p0.g86b6d4b.assembly.stream.el9 for distgit cluster-monitoring-operator. All builds following this will include this PR.
This change switches the metrics-server's readiness probe to use the
/livez
endpoint instead of/readyz
for single-node deployments.By default, the
/readyz
endpoint is used to assert the component readiness. This endpoint returns success when the metrics-server has metric samples over 2 intervals (e.g. it has scraped at least one kubelet twice).In single-node deployments, it happens sometimes (especially in end-to-end tests) that the kubelet fails to respond in a timely fashion due to contention in cAdvisor, leading to a delayed readiness (and test failures). To workaround the issue, we use the
/livez
endpoint in this mode.The long-term plan is to switch resource metrics from cAdvisor to the CRI stats API (currently an alpha feature). Once it happens, we can remove this change.