Track OnTag failures - Githubissues

sowmyav27 commented 1 year ago

Following jobs passed successfully.

Ontag go certification
Ontag custom certification
Ontag aks certificaiton
Ontag eks certification
Ontag az certification

Following job fails.

Ontag import certification
- rke up command fails to run successfully.
Ontag k3s certification
- This job fails due to this issue - https://github.com/rancher/rancher/issues/39926
Ontag ec2 certification
- This job fails to pass the test_wl_with_lb test cases. due to limit for the max rules per security group has been reached. If we clear the rules in security group, this job should pass.
- LB service shows this error - Error syncing load balancer: failed to ensure load balancer: error authorizing security group ingress: "RulesPerSecurityGroupLimitExceeded: The maximum number of rules per security group has been reached.\n\tstatus code: 400, request id: b4013396-a8ca-4df3-9118-0ae1c2b43422"

igomez06 commented 1 year ago

The aks certification fails occasionally with this error: AssertionError: Timed out waiting for pods in workload default-71758. Expected 1. Got 3 it's flaky should be looked at more closely

igomez06 commented 1 year ago

The az cert tests occasionally fails with this error:

p_client = <rancher.Client object at 0x7fef6976d0d0>
workload = {'actions': {'redeploy': 'https://<rancher-server>/v3/project/c-pp5zw:p-km2vv/workloads/daemonset:test-1...ls': {'cattle.io/creator': 'norman', 'workload.user.cattle.io/workloadselector': 'daemonSet-test-13740-default-50938'}}
timeout = 600

    def get_endpoint_url_for_workload(p_client, workload, timeout=600):
        fqdn_available = False
        url = ""
        start = time.time()
        while not fqdn_available:
            if time.time() - start > timeout:
                raise AssertionError(
>                   "Timed out waiting for endpoint to be available")
E               AssertionError: Timed out waiting for endpoint to be available

tests/v3_api/common.py:800: AssertionError

we may need to increase the timeout, but I am not sure. Just want to update so we can still continue on our flaky test actions.

igomez06 commented 1 year ago

Ontag import k3s certification is flaky with the ingress tests, usually need to run it twice to get it to pass.

igomez06 commented 11 months ago

I don't see this written, but the idea was to have this job also run the v2 eks rather than v1, so that is another update that is needed.

igomez06 commented 8 months ago

ec2 tests are failing because of attempting to provision with a in tree cloud provider on a 1.27 cluster.

sowmyav27 commented 8 months ago

Latest failures on 2.7x rancher

AZ - fails one time, but when re-run it passes. (There should be no failures. Review why there is a failure.)

Custom RKE1 cluster - tests.v3_api.test_rke_cluster_provisioning.test_rke_custom_host_2 -

rancher.ApiError: (ApiError(...), 'ServerError : Get "https://<>:6443/api/v1/namespaces/kube-public/replicationcontrollers?timeout=45s": tunnel disconnect\n\t{\'baseType\': \'error\', \'code\': \'ServerError\', \'message\': \'Get "https://<>:6443/api/v1/namespaces/kube-public/replicationcontrollers?timeout=45s": tunnel disconnect\', \'status\': 500, \'type\': \'error\'}')

rancher_ontag_go_certification -->

Test Result (6 failures / +6)
github.com/rancher/rancher/tests/v2/validation/provisioning/k3s.TestK3SProvisioningTestSuite/TestProvisioningK3SCluster/1_Node_all_roles_Admin_User_Node_Provider:_azure_Kubernetes_version:_v1.26.11+k3s2_cni:_calico
github.com/rancher/rancher/tests/v2/validation/provisioning/k3s.TestK3SProvisioningTestSuite/TestProvisioningK3SCluster
github.com/rancher/rancher/tests/v2/validation/provisioning/k3s.TestK3SProvisioningTestSuite
github.com/rancher/rancher/tests/v2/validation/provisioning/rke2.TestRKE2ProvisioningTestSuite/TestProvisioningRKE2Cluster/1_Node_all_roles_Admin_User_Node_Provider:_azure_Kubernetes_version:_v1.26.11+rke2r1_cni:_calico
github.com/rancher/rancher/tests/v2/validation/provisioning/rke2.TestRKE2ProvisioningTestSuite/TestProvisioningRKE2Cluster
github.com/rancher/rancher/tests/v2/validation/provisioning/rke2.TestRKE2ProvisioningTestSuite

EKS --> Cleaned up VPCs and re-ran the job

Exception: Timeout waiting for cluster to satisfy condition:         lambda x: x.state == "active",
E               State is: provisioning

Note: EKS ontag is using v1 version. This job needs to be skipped.

EKS clusters should have ingress tests skipped.
Import RKE1 cluster -
- RKE_VERSION was old
- And IF the cluster is active and test errors out, it is because PSA is enabled, and workloads deployed fail to come up Active. @anupama2501 will be working on this.
```
Exception: Timeout waiting for cluster to satisfy condition:         lambda x: x.state == "active",
E               State is: pending
```

Import k3s - @anupama2501 to work on fixing this

tests.v3_api.test_workload.test_wl_with_nodePort
tests.v3_api.test_workload.test_wl_with_nodePort_scale_and_upgrade

rancher / qa-tasks

Track OnTag failures #821