openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.49k stars 4.7k forks source link

NO-ISSUE: Add Capability and FeatureGate checks to OLMv1 tests #29290

Closed tmshort closed 6 days ago

openshift-ci-robot commented 1 week ago

@tmshort: This pull request explicitly references no jira issue.

In response to [this](https://github.com/openshift/origin/pull/29290): > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Forigin). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
tmshort commented 1 week ago

Alternative to #29283 This one doesn't update openshift/api

tmshort commented 1 week ago

/payload-job periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview

openshift-ci[bot] commented 1 week ago

@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f10e52a0-a1f2-11ef-856c-c732c763feb6-0

wking commented 1 week ago

/test what-is-available

openshift-ci[bot] commented 1 week ago

@wking: The specified target(s) for /test were not found. The following commands are available to trigger required jobs:

The following commands are available to trigger optional jobs:

Use /test all to run the following jobs that were automatically triggered:

In response to [this](https://github.com/openshift/origin/pull/29290#issuecomment-2474510822): >/test what-is-available Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
tmshort commented 1 week ago

/retest

tmshort commented 1 week ago

/retest

wking commented 1 week ago

The tech-preview payload run failed on an unrelated KubeAPIErrorBudgetBurn issue. It skipped the OLM tests:

$ curl -s https://storage.googleapis.com/test-platform-results/logs/openshift-origin-29290-ci-4.18-e2e-aws-ovn-techpreview/1856776673144344576/build-log.txt | grep OCPFeatureGate:NewOLM
started: 0/281/508 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 CRDs should be installed [Suite:openshift/conformance/parallel]"
skipped: (2.4s) 2024-11-13T21:04:00 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 CRDs should be installed [Suite:openshift/conformance/parallel]"
started: 0/291/508 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 operator installation should install a cluster extension [Suite:openshift/conformance/parallel]"
skipped: (3.1s) 2024-11-13T21:04:08 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 operator installation should install a cluster extension [Suite:openshift/conformance/parallel]"
started: 0/354/508 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 Catalogs should be installed [Suite:openshift/conformance/parallel]"
skipped: (2.7s) 2024-11-13T21:04:46 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 Catalogs should be installed [Suite:openshift/conformance/parallel]"

because it was run early enough that openshift/cluster-version-operator#1108 wasn't in 4.18 CI builds yet. Launching a fresh tech-preview job:

/test e2e-gcp-ovn-techpreview

neisw commented 1 week ago

/approve /hold

Looked like e2e-gcp-ovn-techpreview had the same skip. Feel free to remove the hold when you are ready.

wking commented 1 week ago

e2e-gcp-ovn-techpreview:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/29290/pull-ci-openshift-origin-master-e2e-gcp-ovn-techpreview/1856824118029062144/artifacts/e2e-gcp-ovn-techpreview/openshift-e2e-test/build-log.txt | grep OCPFeatureGate:NewOLM
started: 0/27/508 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 CRDs should be installed [Suite:openshift/conformance/parallel]"
skipped: (6.3s) 2024-11-14T00:06:50 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 CRDs should be installed [Suite:openshift/conformance/parallel]"
started: 0/118/508 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 operator installation should install a cluster extension [Suite:openshift/conformance/parallel]"
skipped: (6.3s) 2024-11-14T00:08:06 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 operator installation should install a cluster extension [Suite:openshift/conformance/parallel]"
started: 5/489/508 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 Catalogs should be installed [Suite:openshift/conformance/parallel]"
skipped: (4.2s) 2024-11-14T00:16:50 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 Catalogs should be installed [Suite:openshift/conformance/parallel]"
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/29290/pull-ci-openshift-origin-master-e2e-gcp-ovn-techpreview/1856824118029062144/artifacts/e2e-gcp-ovn-techpreview/openshift-e2e-test/build-log.txt | grep -B3 OCPFeatureGate:NewOLM | tail -n4
skip [github.com/openshift/origin/test/extended/olm/olmv1.go:168]: Test only runs with OLMv1 capability
Ginkgo exit error 3: exit with code 3

skipped: (4.2s) 2024-11-14T00:16:50 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 Catalogs should be installed [Suite:openshift/conformance/parallel]"

Hmm, that's surprising.

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/29290/pull-ci-openshift-origin-master-e2e-gcp-ovn-techpreview/1856824118029062144/artifacts/e2e-gcp-ovn-techpreview/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.capabilities.knownCapabilities[]' | grep O
OperatorLifecycleManager

So still no sign of OperatorLifecycleManagerV1.

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/29290/pull-ci-openshift-origin-master-e2e-gcp-ovn-techpreview/1856824118029062144/artifacts/e2e-gcp-ovn-techpreview/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-769cf984fc-p7xr7_cluster-version-operator.log | head -n1
I1113 23:02:05.498754       1 start.go:23] ClusterVersionOperator v1.0.0-1268-g2e594cdd-dirty

In a CVO checkout:

cluster-version-operator$ git --no-pager log --first-parent --format='%ad %h %s' -2
Wed Nov 13 18:45:27 2024 +0000 b0eddfee Merge pull request #1108 from LalatenduMohanty/add_olmv1_capability_api
Mon Nov 11 15:57:13 2024 +0000 2e594cdd Merge pull request #1091 from petr-muller/scaffold-update-status-operator

so this cluster is still not using the new CVO commit? Back to the job run

INFO[2024-11-13T22:21:20Z] Building release initial from a snapshot of ocp/4.18 
INFO[2024-11-13T22:21:20Z] Building release latest from a snapshot of ocp/4.18

Checking app.ci:

$ oc whoami -c
default/api-ci-l2s4-p1-openshiftapps-com:6443/wking
$ oc -n ocp get -o json imagestream 4.18 | jq -c '.status.tags[] | select(.tag == "cluster-version-operator").items[] | {created, image}'
{"created":"2024-11-13T19:20:49Z","image":"sha256:91dda68f5256eeae7c934e677e7a8b36add124faad0c4124e44063af2c008efa"}
{"created":"2024-11-11T16:23:01Z","image":"sha256:762a726e9118e1879691044d61c9cd6e388c2add8e6da920b74ba455c8975a55"}
{"created":"2024-11-11T12:47:21Z","image":"sha256:bef79b6f45e551a34e3d22877721cf8f4937b80f0b233d34fdeb259181313c48"}
$ oc image info registry.ci.openshift.org/ocp/4.18@sha256:91dda68f5256eeae7c934e677e7a8b36add124faad0c4124e44063af2c008efa | grep 'Created\|vcs-ref'
Created:       9h ago
               vcs-ref=b0eddfee2890ed868cbecc36594490e9afa6c9f1

Hmm, certainly looks like pulling ocp/4.18 at 22:21 UTC should have picked up that most-recent commit pushed 19:20 UTC? Not sure why the running CVO in this cluster is claiming to be from the older commit.

wking commented 1 week ago

More poking at that run:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/29290/pull-ci-openshift-origin-master-e2e-gcp-ovn-techpreview/1856824118029062144/artifacts/build-resources/imagestreams.json | jq -c '.items[] | .metadata.name as $n | .status.tags[] | select(.tag == "cluster-version-operator").items[] | {created, image, imageStream: $n}'
{"created":"2024-11-13T19:14:55Z","image":"sha256:762a726e9118e1879691044d61c9cd6e388c2add8e6da920b74ba455c8975a55","imageStream":"stable"}
{"created":"2024-11-13T19:14:59Z","image":"sha256:762a726e9118e1879691044d61c9cd6e388c2add8e6da920b74ba455c8975a55","imageStream":"stable-initial"}

So that's picking up the older hash. But I'm not clear on why, since it seems like it had been rotated out of the source ImageStream by that point. Maybe syncing between app.ci and the build clusters is slow? Testing again:

/test e2e-gcp-ovn-techpreview

openshift-trt-bot commented 1 week ago

Job Failure Risk Analysis for sha: 642b542400e468e4f21714e3c68a7291915715f8

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial Low
[sig-node] static pods should start after being created
This test has passed 77.97% of 59 runs on jobs ['periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-single-node-serial'] in the last 14 days.

Open Bugs
Static pod controller pods sometimes fail to start [etcd]
joelanford commented 1 week ago

/lgtm

openshift-trt-bot commented 1 week ago

Job Failure Risk Analysis for sha: f551e10ab745660bff839d6dc7171eb3c5d837ee

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-kube-apiserver-rollout IncompleteTests
Tests for this run (13) are below the historical average (1653): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (13) are below the historical average (2747): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
grokspawn commented 1 week ago

/lgtm

wking commented 1 week ago

No new TechPreview jobs on the recent commits, but checking the one I'd launched last night:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/29290/pull-ci-openshift-origin-master-e2e-gcp-ovn-techpreview/1856932509166604288/artifacts/e2e-gcp-ovn-techpreview/openshift-e2e-test/build-log.txt | grep NewOLM
started: 2/169/508 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 CRDs should be installed [Suite:openshift/conformance/parallel]"
passed: (5s) 2024-11-14T07:23:01 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 CRDs should be installed [Suite:openshift/conformance/parallel]"
started: 2/274/508 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 operator installation should install a cluster extension [Suite:openshift/conformance/parallel]"
passed: (13.3s) 2024-11-14T07:25:02 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 operator installation should install a cluster extension [Suite:openshift/conformance/parallel]"
started: 3/403/508 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 Catalogs should be installed [Suite:openshift/conformance/parallel]"
passed: (5.7s) 2024-11-14T07:28:12 "[sig-olmv1][OCPFeatureGate:NewOLM] OLMv1 Catalogs should be installed [Suite:openshift/conformance/parallel]"
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/29290/pull-ci-openshift-origin-master-e2e-gcp-ovn-techpreview/1856932509166604288/artifacts/e2e-gcp-ovn-techpreview/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.capabilities.enabledCapabilities[]' | grep O
OperatorLifecycleManager
OperatorLifecycleManagerV1

So the CVO / ClusterVersion bump finally came through, the OperatorLifecycleManagerV1 capability is enabled, the tech-preview suite is running the tests on the capability, and the tests are passing :+1:

tmshort commented 1 week ago

This PR was updated to the point it will no longer pass, because it's expecting v1 APIs now, but it's good to know that the code that deals with capabilities, etc, passes.

joelanford commented 1 week ago

/lgtm

joelanford commented 1 week ago

/hold cancel

openshift-ci[bot] commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: grokspawn, joelanford, LalatenduMohanty, neisw, tmshort

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/origin/blob/master/OWNERS)~~ [neisw] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci-robot commented 1 week ago

/retest-required

Remaining retests: 0 against base HEAD 90f69547de0ebd6847a4a5ca0c8c44e02ea2d7ef and 2 for PR HEAD 6474902f0e8c6715c9522c875784cd031fa19a7f in total

perdasilva commented 1 week ago

/retest

everettraven commented 6 days ago

/retest

joelanford commented 6 days ago

The 4.18 CI builds seem to be failing to even create payload images since our sync changes merged. As I understand it, nothing here will pass until we have a CI image that picks up the changes we merged last night.

openshift-ci[bot] commented 6 days ago

@tmshort: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 6474902f0e8c6715c9522c875784cd031fa19a7f link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-trt-bot commented 6 days ago

Job Failure Risk Analysis for sha: 6474902f0e8c6715c9522c875784cd031fa19a7f

Job Name Failure Risk
pull-ci-openshift-origin-master-okd-scos-e2e-aws-ovn High
[sig-arch] Only known images used by tests
This test has passed 100.00% of 35 runs on jobs ['periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn'] in the last 14 days.
openshift-bot commented 6 days ago

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-tests This PR has been included in build openshift-enterprise-tests-container-v4.19.0-202411151636.p0.g92addf5.assembly.stream.el9. All builds following this will include this PR.