rancher / qa-tasks

List of QA Backlog
1 stars 1 forks source link

Track OnTag failures #821

Open sowmyav27 opened 1 year ago

sowmyav27 commented 1 year ago

Following jobs passed successfully.

Following job fails.

igomez06 commented 1 year ago

The aks certification fails occasionally with this error: AssertionError: Timed out waiting for pods in workload default-71758. Expected 1. Got 3 it's flaky should be looked at more closely

igomez06 commented 1 year ago

The az cert tests occasionally fails with this error:

p_client = <rancher.Client object at 0x7fef6976d0d0>
workload = {'actions': {'redeploy': 'https://<rancher-server>/v3/project/c-pp5zw:p-km2vv/workloads/daemonset:test-1...ls': {'cattle.io/creator': 'norman', 'workload.user.cattle.io/workloadselector': 'daemonSet-test-13740-default-50938'}}
timeout = 600

    def get_endpoint_url_for_workload(p_client, workload, timeout=600):
        fqdn_available = False
        url = ""
        start = time.time()
        while not fqdn_available:
            if time.time() - start > timeout:
                raise AssertionError(
>                   "Timed out waiting for endpoint to be available")
E               AssertionError: Timed out waiting for endpoint to be available

tests/v3_api/common.py:800: AssertionError

we may need to increase the timeout, but I am not sure. Just want to update so we can still continue on our flaky test actions.

igomez06 commented 1 year ago

Ontag import k3s certification is flaky with the ingress tests, usually need to run it twice to get it to pass.

igomez06 commented 11 months ago

I don't see this written, but the idea was to have this job also run the v2 eks rather than v1, so that is another update that is needed.

igomez06 commented 8 months ago

ec2 tests are failing because of attempting to provision with a in tree cloud provider on a 1.27 cluster.

sowmyav27 commented 8 months ago

Latest failures on 2.7x rancher

  1. AZ - fails one time, but when re-run it passes. (There should be no failures. Review why there is a failure.)

  2. Custom RKE1 cluster - tests.v3_api.test_rke_cluster_provisioning.test_rke_custom_host_2 -

    rancher.ApiError: (ApiError(...), 'ServerError : Get "https://<>:6443/api/v1/namespaces/kube-public/replicationcontrollers?timeout=45s": tunnel disconnect\n\t{\'baseType\': \'error\', \'code\': \'ServerError\', \'message\': \'Get "https://<>:6443/api/v1/namespaces/kube-public/replicationcontrollers?timeout=45s": tunnel disconnect\', \'status\': 500, \'type\': \'error\'}')
  3. rancher_ontag_go_certification -->

    Test Result (6 failures / +6)
    github.com/rancher/rancher/tests/v2/validation/provisioning/k3s.TestK3SProvisioningTestSuite/TestProvisioningK3SCluster/1_Node_all_roles_Admin_User_Node_Provider:_azure_Kubernetes_version:_v1.26.11+k3s2_cni:_calico
    github.com/rancher/rancher/tests/v2/validation/provisioning/k3s.TestK3SProvisioningTestSuite/TestProvisioningK3SCluster
    github.com/rancher/rancher/tests/v2/validation/provisioning/k3s.TestK3SProvisioningTestSuite
    github.com/rancher/rancher/tests/v2/validation/provisioning/rke2.TestRKE2ProvisioningTestSuite/TestProvisioningRKE2Cluster/1_Node_all_roles_Admin_User_Node_Provider:_azure_Kubernetes_version:_v1.26.11+rke2r1_cni:_calico
    github.com/rancher/rancher/tests/v2/validation/provisioning/rke2.TestRKE2ProvisioningTestSuite/TestProvisioningRKE2Cluster
    github.com/rancher/rancher/tests/v2/validation/provisioning/rke2.TestRKE2ProvisioningTestSuite
  4. EKS --> Cleaned up VPCs and re-ran the job

    Exception: Timeout waiting for cluster to satisfy condition:         lambda x: x.state == "active",
    E               State is: provisioning

    Note: EKS ontag is using v1 version. This job needs to be skipped.

  5. EKS clusters should have ingress tests skipped.

  6. Import RKE1 cluster -

    • RKE_VERSION was old
    • And IF the cluster is active and test errors out, it is because PSA is enabled, and workloads deployed fail to come up Active. @anupama2501 will be working on this.
      Exception: Timeout waiting for cluster to satisfy condition:         lambda x: x.state == "active",
      E               State is: pending
  7. Import k3s - @anupama2501 to work on fixing this

    tests.v3_api.test_workload.test_wl_with_nodePort
    tests.v3_api.test_workload.test_wl_with_nodePort_scale_and_upgrade