openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.47k stars 4.69k forks source link

status.total counter is not correct for openshift/conformance suite #27350

Closed mtulio closed 1 year ago

mtulio commented 2 years ago

The field total from status is not correct on the openshift/conformance suite (default, parallel).

The problem was found when running the OPCT on the latest release. The OPCT is built on top of openshift-tests binary, and consumes that counter to report the execution to the user when running the tool. More details is available here: https://issues.redhat.com/browse/SPLAT-696

Version
$ oc version
Client Version: 4.10.10
Server Version: 4.11.0
Kubernetes Version: v1.24.0+9546431
Steps To Reproduce
  1. openshift-tests run openshift/conformance
  2. Wait for 1127th test
  3. Check if the total keep increasing with the index, second field [(failed/index/total)] of status
Current Result

after the 1127th test, the total counter keeps increasing with the index:

openshift-tests version: 4.11.0-202208020706.p0.gb860532.assembly.stream-b860532
Starting SimultaneousPodIPControllerI0809 16:31:15.790490    3733 shared_informer.go:255] Waiting for caches to sync for SimultaneousPodIPController
started: (0/1/1127) "[sig-scheduling][Early] The openshift-monitoring pods should be scheduled on different nodes [Suite:openshift/conformance/parallel]"

(...)

started: (0/1126/1127) "[sig-storage] PersistentVolumes-expansion  loopback local block volume should support online expansion on node [Suite:openshift/conformance/parallel] [Suite:k8s]"

passed: (38s) 2022-08-09T17:12:21 "[sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with mount options [Suite:openshift/conformance/parallel] [Suite:k8s]"

started: (0/1127/1127) "[sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: tmpfs] [Testpattern: Generic Ephemeral-volume (block volmode) (late-binding)] ephemeral should support two pods which have the same volume definition [Suite:openshift/conformance/parallel] [Suite:k8s]"

passed: (6.6s) 2022-08-09T17:12:21 "[sig-storage] Downward API volume should provide container's memory request [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"

started: (0/1128/1128) "[sig-storage] In-tree Volumes [Driver: cinder] [Testpattern: Dynamic PV (immediate binding)] topology should fail to schedule a pod which has topologies that conflict with AllowedTopologies [Suite:openshift/conformance/parallel] [Suite:k8s]"

skip [k8s.io/kubernetes@v1.24.0/test/e2e/storage/framework/testsuite.go:116]: Driver local doesn't support GenericEphemeralVolume -- skipping
Ginkgo exit error 3: exit with code 3

skipped: (400ms) 2022-08-09T17:12:21 "[sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: tmpfs] [Testpattern: Generic Ephemeral-volume (block volmode) (late-binding)] ephemeral should support two pods which have the same volume definition [Suite:openshift/conformance/parallel] [Suite:k8s]"

started: (0/1129/1129) "[sig-storage] In-tree Volumes [Driver: emptydir] [Testpattern: Dynamic PV (default fs)] capacity provides storage capacity information [Suite:openshift/conformance/parallel] [Suite:k8s]" 

After that, it keeps increasing until the last test (3475th):

started: (30/3474/3474) "[sig-arch][bz-etcd][Late] Alerts alert/etcdGRPCRequestsSlow should not be at or above pending [Suite:openshift/conformance/parallel]"

passed: (4.5s) 2022-08-09T18:26:40 "[sig-arch][bz-Unknown][Late] Alerts alert/KubePodNotReady should not be at or above info in all the other namespaces [Suite:openshift/conformance/parallel]
"

started: (30/3475/3475) "[sig-arch][bz-Unknown][Late] Alerts alert/KubePodNotReady should not be at or above pending in ns/default [Suite:openshift/conformance/parallel]"
Expected Result
started: (0/1/3475)   (....)
Additional Information

Extracting the openshift-tests from the same release the cluster is running, I got a different counter:

$ ./.local/bin/openshift-install-linux-4.11.0 version
./.local/bin/openshift-install-linux-4.11.0 4.11.0
built from commit 37684309bcb598757c99d3ea9fbc0758343d64a5
release image quay.io/openshift-release-dev/ocp-release@sha256:300bce8246cf880e792e106607925de0a404484637627edf5f517375517d54a4
release architecture amd64

$ RELEASE_IMAGE=$(./.local/bin/openshift-install-linux-4.11.0 version | awk '/release image/ {print $3}')
$ TESTS_IMAGE=$(oc adm release info --image-for='tests' $RELEASE_IMAGE)

$ oc image extract $TESTS_IMAGE --file="/usr/bin/openshift-tests" -a ~/.openshift/pull-secret-latest.json

$ chmod u+x openshift-tests
$ ./openshift-tests run --dry-run openshift/conformance |wc -l
3487
elmiko commented 2 years ago

we talked about this issue during the install flex sync meeting today, we don't think this is overly concerning but it will be an issue for people who want to monitor the count as it's happening. it will be difficult to determine when the tests will end.

openshift-bot commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 1 year ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 1 year ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci[bot] commented 1 year ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/origin/issues/27350#issuecomment-1382968260): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.