red-hat-storage / ocs-ci

https://ocs-ci.readthedocs.io/en/latest/
MIT License
108 stars 166 forks source link

test_pod_reattachtime fails of IBM cloud 4.14 and on IBM cloud 4.15 #9336

Closed ypersky1980 closed 7 months ago

ypersky1980 commented 8 months ago

IBM cloud 4.14 job :
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/33735/testReport/

Failing test cases:

tests.cross_functional.performance.csi_tests.test_pod_reattachtime.TestPodReattachTimePerformance.test_pod_reattach_time_performance[CephFileSystem-3-120-70] 12 min 1 tests.cross_functional.performance.csi_tests.test_pod_reattachtime.TestPodReattachTimePerformance.test_pod_reattach_time_performance[CephFileSystem-13-600-420]

Increase of pod creation time should be increased and after that please consider opening a performance bz .

ypersky1980 commented 8 months ago

from : https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/34125/testReport/tests.cross_functional.performance.csi_tests.test_pod_reattachtime/TestPodReattachTimePerformance/test_pod_reattach_time_performance_CephFileSystem_3_120_70_/ :+1:

ocs_ci/helpers/helpers.py:123: ResourceWrongStatusException

ocs_ci.ocs.exceptions.ResourceWrongStatusException: Resource pvc-test-461975600e394995aabe3963ab8ead1 describe output: Name: pvc-test-461975600e394995aabe3963ab8ead1 Namespace: namespace-pas-test-dd6d6842c94d4010a0926 StorageClass: storageclass-test-cephfs-d6781b17d9e142b Status: Pending Volume:
Labels: Annotations: volume.beta.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com volume.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity:
Access Modes:
VolumeMode: Filesystem Used By: Events: Type Reason Age From Message


Normal Provisioning 63s openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-5d7f5644ff-2v5bx_ba13c42f-d69e-4560-8751-fbf62e079909 External provisioner is provisioning volume for claim "namespace-pas-test-dd6d6842c94d4010a0926/pvc-test-461975600e394995aabe3963ab8ead1" Normal ExternalProvisioning 3s (x6 over 63s) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'openshift-storage.cephfs.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.

ypersky1980 commented 7 months ago

http://10.0.78.167:8080/index.php?version1=40&build1=129&platform1=16&az_topology1=1&test_name%5B%5D=1&test_name%5B%5D=2&test_name%5B%5D=3&test_name%5B%5D=4&test_name%5B%5D=6&test_name%5B%5D=8&test_name%5B%5D=9&test_name%5B%5D=10&test_name%5B%5D=11&test_name%5B%5D=15&test_name%5B%5D=16&test_name%5B%5D=17&test_name%5B%5D=18&test_name%5B%5D=20&test_name%5B%5D=21&test_name%5B%5D=23&version2=&build2=&version3=&build3=&version4=&build4=&submit=Choose+options

The above are the results of pod reattach time test in 4.15.

Rerun on VMware LSO and if passes - close the issue.

ypersky1980 commented 7 months ago

This is the run that was initiated on IBM cloud 4.14

https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/35137/testReport/

The following test cases failed :

tests.cross_functional.performance.csi_tests.test_pod_reattachtime.TestPodReattachTimePerformance.test_pod_reattach_time_performance[CephFileSystem-3-120-70]

Error:

ocs_ci.ocs.exceptions.PerformanceException: Pod creation time is 87.04964661598206 and greater than 70 seconds

tests.cross_functional.performance.csi_tests.test_pod_reattachtime.TestPodReattachTimePerformance.test_pod_reattach_time_performance[CephFileSystem-13-600-420]

Error:

ocs_ci.ocs.exceptions.PerformanceException: Pod creation time is 491.2622141838074 and greater than 420 seconds

Conclusion: Submit a PR and increase Pod Creation time.

After the PR is merged - compare the results to 4.13 and consider opening a BZ.

ypersky1980 commented 7 months ago

From latest PR validation (https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-trigger-test-pr/6665/testReport/tests.cross_functional.performance.csi_tests.test_pod_reattachtime/TestPodReattachTimePerformance/test_pod_reattach_time_performance_CephFileSystem_13_600_720_/)

Checking wether pod DELETION when pod is with many files indeed is working fine.

Full error:

ocs_ci.ocs.exceptions.ResourceWrongStatusException: Resource pod-test-cephfs-2c0e9b8c65f240a098f6181c describe output: Name: pod-test-cephfs-2c0e9b8c65f240a098f6181c Namespace: namespace-pas-test-dce771fe9db54be9a0ff0 Priority: 0 Service Account: default Node: compute-1/10.1.160.247 Start Time: Mon, 18 Mar 2024 12:02:48 +0000 Labels: Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.128.2.89/23"],"mac_address":"0a:58:0a:80:02:59","gateway_ips":["10.128.2.1"],"routes":[{"dest":"10.128.0.0... k8s.v1.cni.cncf.io/network-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.2.89" ], "mac": "0a:58:0a:80:02:59", "default": true, "dns": {} }] openshift.io/scc: anyuid Status: Pending IP: 10.128.2.89 IPs: IP: 10.128.2.89 Containers: performance: Container ID:
Image: quay.io/ocsci/perf:latest Image ID:
Port: Host Port: Command: /bin/sh State: Waiting Reason: CreateContainerError Ready: False Restart Count: 0 Environment: Mounts: /mnt from mypvc (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nftq8 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: mypvc: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-test-21750d550b2147908b1f07f8086d5dd ReadOnly: false kube-api-access-nftq8: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Warning FailedAttachVolume 10m attachdetach-controller Multi-Attach error for volume "pvc-ecf47a0d-cd7a-4ba9-a2da-26cd5cea2cec" Volume is already exclusively attached to one node and can't be attached to another Normal SuccessfulAttachVolume 9m50s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-ecf47a0d-cd7a-4ba9-a2da-26cd5cea2cec" Normal AddedInterface 9m48s multus Add eth0 [10.128.2.89/23] from ovn-kubernetes Normal Pulled 106s (x5 over 9m48s) kubelet Container image "quay.io/ocsci/perf:latest" already present on machine Warning Failed 106s (x4 over 7m48s) kubelet Error: context deadline exceeded

ypersky1980 commented 7 months ago

Opened https://bugzilla.redhat.com/show_bug.cgi?id=2270545

ypersky1980 commented 7 months ago

https://github.com/red-hat-storage/ocs-ci/pull/9538 - PR with a fix.

ypersky1980 commented 7 months ago

The PR was merged, therefore closing the issue.