harvester / tests

Harvester test cases
Apache License 2.0
10 stars 31 forks source link

[TEST] Upgrade the openSUSE image to 15.5, RKE1 & RKE2 versions to 1.27 in test CI #1087

Closed khushboo-rancher closed 4 months ago

khushboo-rancher commented 8 months ago

What's the test to develop? Please describe

Our test is set to run with rke1 & rke2 versions as 1.24 and the default value of image for guest cluster is 15.4. We need to update them.

openSUSE image - 15.5 K8s version - 1.27

khushboo-rancher commented 8 months ago

The test running on ECM is running with RKE1 v1.26.9-rancher1-1 and RKE2 v1.26.10+rke2r2. We can keep https://github.com/harvester/tests/blob/main/config.yml in sync with it.

albinsun commented 8 months ago

Ref. https://documentation.suse.com/sles/15-SP5/single-html/SLES-minimal-vm/index.html

albinsun commented 8 months ago

2 modules are skipped: ca-certs and ssh-import-id

myrke2c-pool1-d27f96e5-5r27g:/var/log # grep -v DEBUG cloud-init.log
2024-01-31 02:00:59,171 - stages.py[INFO]: Loaded datasource DataSourceNoCloud - DataSourceNoCloud [seed=/dev/vda][dsmode=net]
2024-01-31 02:00:59,323 - stages.py[INFO]: Applying network configuration from fallback bringup=False: {'ethernets': {'eth0': {'dhcp4': True, 'set-name': 'eth0', 'match': {'macaddress': 'a6:76:67:84:e7:d7'}}}, 'version': 2}
{'type': 'physical', 'name': 'eth0', 'mac_address': 'a6:76:67:84:e7:d7', 'match': {'macaddress': 'a6:76:67:84:e7:d7'}, 'subnets': [{'type': 'dhcp4'}]}
{'eth0': {'dhcp4': True, 'set-name': 'eth0', 'match': {'macaddress': 'a6:76:67:84:e7:d7'}}}
2024-01-31 02:01:05,158 - stages.py[INFO]: Skipping modules 'ca-certs' because they are not verified on distro 'sles'.  To run anyway, add them to 'unverified_modules' in config.
2024-01-31 02:01:05,619 - cc_growpart.py[INFO]: '/' resized: changed (/dev/vdb, 3) from 838843904 to 42911907328
2024-01-31 02:01:06,489 - stages.py[INFO]: Skipping modules 'ssh-import-id' because they are not verified on distro 'sles'.  To run anyway, add them to 'unverified_modules' in config.
albinsun commented 8 months ago

[!NOTE] (Update) This issue is an env. issue, the cluster is up after we switch to a more robust env.

Config

  1. Custom cloud-config

    password: password
    chpasswd: {expire: False}
    ssh_pwauth: True
    runcmd:
     - SUSEConnect -r REGISTRATION_CODE
    unverified_modules:
     - ca-certs
     - ssh-import-id
  2. Generate kubeconfig for cloud-provider and append to cloud-init Ref. https://docs.harvesterhci.io/v1.3/rancher/cloud-provider#deploying-to-the-k3s-cluster-with-harvester-node-driver-experimental

  3. Additional Manifest image

Symptoms

  1. Stuck at updating image

  2. Fail syncing etcd-endpoints://0xc0008cb340/127.0.0.1:237, context deadline exceeded journalctl -u rke2-server.service --follow

    Feb 01 17:46:50 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"warn","ts":"2024-02-01T17:46:50.588559Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008cb340/10.84.99.161:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    Feb 01 17:46:50 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: time="2024-02-01T17:46:50Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
    Feb 01 17:46:54 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"warn","ts":"2024-02-01T17:46:54.465424Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008cb340/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    Feb 01 17:46:54 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"info","ts":"2024-02-01T17:46:54.466753Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/client.go:210","msg":"Auto sync endpoints failed.","error":"context deadline exceeded"}
    Feb 01 17:47:05 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"warn","ts":"2024-02-01T17:47:05.591217Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008cb340/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    Feb 01 17:47:05 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: time="2024-02-01T17:47:05Z" level=error msg="Failed to get etcd members for learner management: context deadline exceeded"
    Feb 01 17:47:19 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"warn","ts":"2024-02-01T17:47:19.722532Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008cb340/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    Feb 01 17:47:19 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"info","ts":"2024-02-01T17:47:19.723051Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/client.go:210","msg":"Auto sync endpoints failed.","error":"context deadline exceeded"}
    Feb 01 17:47:20 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"warn","ts":"2024-02-01T17:47:20.595938Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008cb340/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
    Feb 01 17:47:20 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: time="2024-02-01T17:47:20Z" level=error msg="Failed to get recorded learner progress from etcd: context deadline exceeded"
    ...
  3. error syncing harvester-cloud-provider, helmcharts not found

    
    myrke2-sles15-pool1-5fe3e24e-rfqtk:/var/log # journalctl -u rke2-server.service --follow
    Feb 01 19:28:46 myrke2-sles15-pool1-5fe3e24e-rfqtk rke2[4794]: time="2024-02-01T19:28:46Z" level=error msg="error syncing 'kube-system/harvester-cloud-provider': handler helm-controller-chart-registration: DesiredSet - Replace Wait batch/v1, Kind=Job kube-system/helm-install-harvester-cloud-provider for helm-controller-chart-registration kube-system/harvester-cloud-provider, requeuing"
    Feb 01 19:28:46 myrke2-sles15-pool1-5fe3e24e-rfqtk rke2[4794]: time="2024-02-01T19:28:46Z" level=error msg="error syncing 'kube-system/harvester-cloud-provider': handler helm-controller-chart-registration: helmcharts.helm.cattle.io \"harvester-cloud-provider\" not found, requeuing"


## Similar Issues
1. [RKE2 fails if docker is installed and running on the host on which the RKE2 nodes are running.](https://github.com/rancher/rke2/issues/4472#top)
#4472
3. [Issue running cluster-reset on v1.26 releases](https://github.com/rancher/rke2/issues/4052#top)
#4052
4. [rke2-server failing to start; no static pods](https://github.com/rancher/rke2/issues/2080#top)
#2080
albinsun commented 8 months ago

Provide test result on harvester-v1.3.1-rc1 + rancher-v2.7.9.

Hit 2 issues:

  1. Load balancer status
  2. RKE2 cluster scale down (known issue)

Please refer following for detail.

Environment

Steps

  1. Import Harvester to Rancher and create cloud credential

    • Harvester is Active :heavy_check_mark: ![image](https://github.com/harvester/tests/assets/2773781/8033e356-9d75-429f-9f00-194f9084473c)
  2. Create RKE2 cluster

    • Using image sles15-sp5-minimal-vm.x86_64-cloud-gm.qcow2 ![image](https://github.com/harvester/tests/assets/2773781/5c61a5a2-278f-4808-833a-22cfcb3e492d) ![image](https://github.com/harvester/tests/assets/2773781/d1c6eecb-b4ce-48bb-8f9c-cee4d39ec3da)
    • Customize cloud-config for SLES registration code ![image](https://github.com/harvester/tests/assets/2773781/eaf31d35-b029-4164-80e6-95ad19a6ac28)
    • RKE2 v1.26.11+rke2r1 with Harvester cloud provider ![image](https://github.com/harvester/tests/assets/2773781/d593cae5-365e-42dc-b82c-afac7d115d75)
  3. Check RKE2 cluster status

    • Cluster is Active :heavy_check_mark: ![image](https://github.com/harvester/tests/assets/2773781/dd1e2735-ba65-4e87-bb60-1fcb9be6074b)
    • harvester-cloud-provider and harvester-csi-driver are deployed :heavy_check_mark: ![image](https://github.com/harvester/tests/assets/2773781/7a6be06b-9b5b-43cb-b0aa-4476b87028c0)
  4. Deploy Nginx workload with pvc mount

    • Use image nginx:latest ![image](https://github.com/harvester/tests/assets/2773781/94ecfbcb-875e-4d24-bd57-664909140c28)
    • Create PVC on-demand ![image](https://github.com/harvester/tests/assets/2773781/52882e56-c923-4bcd-87fd-fb57e0cb6c76)
    • Mount to /data ![image](https://github.com/harvester/tests/assets/2773781/cd2ab034-8cf4-4b26-b7e7-861aba2ac3aa)
    • Deployment is Active and PVC does mount ![image](https://github.com/harvester/tests/assets/2773781/901746c1-e609-4ae9-b2af-dceaeed2dd56)
  5. Deploy Load-Balancer Service for Nginx deployment

    • Status is Active :x: Shows Pending ![image](https://github.com/harvester/tests/assets/2773781/daa3ec6a-705a-476a-9453-79304da0dbda) ![image](https://github.com/harvester/tests/assets/2773781/0ade4a1c-d5b0-4c5e-89ca-c709a7ba28df)
    • Can access target nginx :heavy_check_mark: ![image](https://github.com/harvester/tests/assets/2773781/e395a2f7-020b-486c-8f61-dfb17f7659a4)
  6. Scaling

    • Scale Up :heavy_check_mark: ![image](https://github.com/harvester/tests/assets/2773781/e3717597-1240-49cd-8ccd-d9a5c3771102) Deployment and LB still works ![image](https://github.com/harvester/tests/assets/2773781/d6c66c78-a77d-4fdb-8d84-729172a365e0) ![image](https://github.com/harvester/tests/assets/2773781/35047a74-e94c-4cd0-8764-8884f431f522)
    • Scale Down :x: harvester/harvester/issues/4358 ![image](https://github.com/harvester/tests/assets/2773781/05c26b02-d389-44d3-a19f-6b0ee7f61720) ~Deployment and LB still works~

Supportbundle

supportbundle_sles15_2024-02-02T11-40-15Z.zip

albinsun commented 5 months ago

Provide comparism between openSUSE Leap 15.3 and 15.5, they give identical test result, will send PR to bump opensuse-image-url to 15.5.

openSUSE-15.5 Verification

Currently fixture image_opensuse are used by following test suites:

  1. _test_1images.py
  2. _test_3_vmfunctions.py
  3. _test_4_vm_backuprestore.py
  4. _test_4_vmsnapshot.py
  5. _test_5_vmnetworks.py
  6. _test_5_vm_networksinteract.py
  7. _test_zterraform.py

Ref. https://github.com/search?q=repo%3Aharvester%2Ftests%20image_opensuse&type=code

Compare test run based on openSUSE-15.3 and-15.5

  1. :green_circle: _test_1images.py

    • openSUSE-15.3: 6 Passed, 0 failed ![image](https://github.com/harvester/tests/assets/2773781/4cb47de2-0b94-4e14-848d-b97299c3b930)
    • openSUSE-15.5: 6 Passed, 0 failed ![image](https://github.com/harvester/tests/assets/2773781/75eff364-dfe0-498b-87f0-5db97b668058)
  2. :green_circle: _test_3_vmfunctions.py

    • openSUSE-15.3: 30 Passed, 2 failed (test_update_vm_machine_type[q35_to_pc, pc_to_q35]) ![image](https://github.com/harvester/tests/assets/2773781/16c2aea2-24e0-4b80-8970-f1be0e8e0079)
    • openSUSE-15.5: 30 Passed, 2 failed (test_update_vm_machine_type[q35_to_pc, pc_to_q35]) ![image](https://github.com/harvester/tests/assets/2773781/62099f80-eec1-46dc-af2f-d952911c9bcc)
  3. :green_circle: _test_4_vm_backuprestore.py

    • openSUSE-15.3: 20 Passed, 2 failed (test_restore_replace_with_delete_vols[S3, NFS]) ![image](https://github.com/harvester/tests/assets/2773781/7d926844-c919-479d-ac27-babaa5ea7926)
    • openSUSE-15.5: 20 Passed, 2 failed (test_restore_replace_with_delete_vols[S3, NFS]) ![image](https://github.com/harvester/tests/assets/2773781/0c2a5986-6f64-452c-bdb3-e1f8fcdaadcc)
  4. :green_circle: _test_4_vmsnapshot.py

    • openSUSE-15.3: 8 Passed ![image](https://github.com/harvester/tests/assets/2773781/8429e0c4-d229-46a0-ae85-5a923dd9e2ce)
    • openSUSE-15.5: 8 Passed ![image](https://github.com/harvester/tests/assets/2773781/1af92918-0984-4b22-86c4-31745eb53cee)
  5. :green_circle: _test_5_vmnetworks.py

    Fail too on daily test env.

    • openSUSE-15.3: 1 skipped, 2 errors (test_add_vlan, test_vms_on_same_vlan: VM network created but route info not available)
    • openSUSE-15.5: 1 skipped, 2 errors (test_add_vlan, test_vms_on_same_vlan: VM network created but route info not available) ![image](https://github.com/harvester/tests/assets/2773781/d7ed8806-9297-42e5-91d1-6069a5580a8c)
  6. :green_circle: _test_5_vm_networksinteract.py

    • openSUSE-15.3: 1 Passed, 4 errors (test_vlan_network_connection, test_mgmt_to_vlan_connection, test_vlan_to_mgmt_connection, test_delete_vlan_from_multiple: VM network created but route info not available) ![image](https://github.com/harvester/tests/assets/2773781/7fbb0004-a25d-4cd3-b898-525f3d22fae3)
    • openSUSE-15.5: 1 Passed, 4 errors (test_vlan_network_connection, test_mgmt_to_vlan_connection, test_vlan_to_mgmt_connection, test_delete_vlan_from_multiple: VM network created but route info not available) ![image](https://github.com/harvester/tests/assets/2773781/3e4df5cd-c6c0-4206-9efe-4ed77e71add3)
  7. :green_circle: _test_zterraform.py

    • openSUSE-15.3: 13 Passed ![image](https://github.com/harvester/tests/assets/2773781/0bbdd50e-e0f0-4ed2-ac9f-6b90317d005d)
    • openSUSE-15.5: 13Passed ![image](https://github.com/harvester/tests/assets/2773781/188b7cdd-87e4-4e99-8834-72a445dccbba)
albinsun commented 4 months ago

Close as PRs already merged.