Closed khushboo-rancher closed 4 months ago
The test running on ECM is running with RKE1 v1.26.9-rancher1-1 and RKE2 v1.26.10+rke2r2. We can keep https://github.com/harvester/tests/blob/main/config.yml in sync with it.
2 modules are skipped: ca-certs
and ssh-import-id
myrke2c-pool1-d27f96e5-5r27g:/var/log # grep -v DEBUG cloud-init.log
2024-01-31 02:00:59,171 - stages.py[INFO]: Loaded datasource DataSourceNoCloud - DataSourceNoCloud [seed=/dev/vda][dsmode=net]
2024-01-31 02:00:59,323 - stages.py[INFO]: Applying network configuration from fallback bringup=False: {'ethernets': {'eth0': {'dhcp4': True, 'set-name': 'eth0', 'match': {'macaddress': 'a6:76:67:84:e7:d7'}}}, 'version': 2}
{'type': 'physical', 'name': 'eth0', 'mac_address': 'a6:76:67:84:e7:d7', 'match': {'macaddress': 'a6:76:67:84:e7:d7'}, 'subnets': [{'type': 'dhcp4'}]}
{'eth0': {'dhcp4': True, 'set-name': 'eth0', 'match': {'macaddress': 'a6:76:67:84:e7:d7'}}}
2024-01-31 02:01:05,158 - stages.py[INFO]: Skipping modules 'ca-certs' because they are not verified on distro 'sles'. To run anyway, add them to 'unverified_modules' in config.
2024-01-31 02:01:05,619 - cc_growpart.py[INFO]: '/' resized: changed (/dev/vdb, 3) from 838843904 to 42911907328
2024-01-31 02:01:06,489 - stages.py[INFO]: Skipping modules 'ssh-import-id' because they are not verified on distro 'sles'. To run anyway, add them to 'unverified_modules' in config.
[!NOTE] (Update) This issue is an env. issue, the cluster is up after we switch to a more robust env.
Custom cloud-config
password: password
chpasswd: {expire: False}
ssh_pwauth: True
runcmd:
- SUSEConnect -r REGISTRATION_CODE
unverified_modules:
- ca-certs
- ssh-import-id
Generate kubeconfig for cloud-provider and append to cloud-init Ref. https://docs.harvesterhci.io/v1.3/rancher/cloud-provider#deploying-to-the-k3s-cluster-with-harvester-node-driver-experimental
Additional Manifest
Stuck at updating
Fail syncing etcd-endpoints://0xc0008cb340/127.0.0.1:237
, context deadline exceeded
journalctl -u rke2-server.service --follow
Feb 01 17:46:50 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"warn","ts":"2024-02-01T17:46:50.588559Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008cb340/10.84.99.161:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Feb 01 17:46:50 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: time="2024-02-01T17:46:50Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Feb 01 17:46:54 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"warn","ts":"2024-02-01T17:46:54.465424Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008cb340/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Feb 01 17:46:54 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"info","ts":"2024-02-01T17:46:54.466753Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/client.go:210","msg":"Auto sync endpoints failed.","error":"context deadline exceeded"}
Feb 01 17:47:05 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"warn","ts":"2024-02-01T17:47:05.591217Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008cb340/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Feb 01 17:47:05 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: time="2024-02-01T17:47:05Z" level=error msg="Failed to get etcd members for learner management: context deadline exceeded"
Feb 01 17:47:19 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"warn","ts":"2024-02-01T17:47:19.722532Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008cb340/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Feb 01 17:47:19 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"info","ts":"2024-02-01T17:47:19.723051Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/client.go:210","msg":"Auto sync endpoints failed.","error":"context deadline exceeded"}
Feb 01 17:47:20 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: {"level":"warn","ts":"2024-02-01T17:47:20.595938Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0008cb340/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Feb 01 17:47:20 myrke2-sles15-pool1-2443a795-wb57d rke2[19513]: time="2024-02-01T17:47:20Z" level=error msg="Failed to get recorded learner progress from etcd: context deadline exceeded"
...
error syncing harvester-cloud-provider, helmcharts not found
myrke2-sles15-pool1-5fe3e24e-rfqtk:/var/log # journalctl -u rke2-server.service --follow
Feb 01 19:28:46 myrke2-sles15-pool1-5fe3e24e-rfqtk rke2[4794]: time="2024-02-01T19:28:46Z" level=error msg="error syncing 'kube-system/harvester-cloud-provider': handler helm-controller-chart-registration: DesiredSet - Replace Wait batch/v1, Kind=Job kube-system/helm-install-harvester-cloud-provider for helm-controller-chart-registration kube-system/harvester-cloud-provider, requeuing"
Feb 01 19:28:46 myrke2-sles15-pool1-5fe3e24e-rfqtk rke2[4794]: time="2024-02-01T19:28:46Z" level=error msg="error syncing 'kube-system/harvester-cloud-provider': handler helm-controller-chart-registration: helmcharts.helm.cattle.io \"harvester-cloud-provider\" not found, requeuing"
## Similar Issues
1. [RKE2 fails if docker is installed and running on the host on which the RKE2 nodes are running.](https://github.com/rancher/rke2/issues/4472#top)
#4472
3. [Issue running cluster-reset on v1.26 releases](https://github.com/rancher/rke2/issues/4052#top)
#4052
4. [rke2-server failing to start; no static pods](https://github.com/rancher/rke2/issues/2080#top)
#2080
Provide test result on harvester-v1.3.1-rc1
+ rancher-v2.7.9
.
Hit 2 issues:
Please refer following for detail.
v1.3.1-rc1
Auto
v2.7.9
v1.26.11+rke2r1
Import Harvester to Rancher and create cloud credential
Create RKE2 cluster
v1.26.11+rke2r1
with Harvester cloud provider
Check RKE2 cluster status
Active
:heavy_check_mark:
Deploy Nginx workload with pvc mount
nginx:latest
Deploy Load-Balancer Service for Nginx deployment
Active
:x:
Pending
![image](https://github.com/harvester/tests/assets/2773781/daa3ec6a-705a-476a-9453-79304da0dbda)
![image](https://github.com/harvester/tests/assets/2773781/0ade4a1c-d5b0-4c5e-89ca-c709a7ba28df)
Scaling
Provide comparism between openSUSE Leap 15.3
and 15.5
, they give identical test result, will send PR to bump opensuse-image-url
to 15.5
.
Currently fixture image_opensuse
are used by following test suites:
Ref. https://github.com/search?q=repo%3Aharvester%2Ftests%20image_opensuse&type=code
v1.2.2-dev-20240329
:green_circle: _test_1images.py
:green_circle: _test_3_vmfunctions.py
test_update_vm_machine_type[q35_to_pc, pc_to_q35]
)
test_update_vm_machine_type[q35_to_pc, pc_to_q35]
)
:green_circle: _test_4_vm_backuprestore.py
test_restore_replace_with_delete_vols[S3, NFS]
)
test_restore_replace_with_delete_vols[S3, NFS]
)
:green_circle: _test_4_vmsnapshot.py
:green_circle: _test_5_vmnetworks.py
Fail too on daily test env.
test_add_vlan
, test_vms_on_same_vlan
: VM network created but route info not available
)
test_add_vlan
, test_vms_on_same_vlan
: VM network created but route info not available
)
:green_circle: _test_5_vm_networksinteract.py
test_vlan_network_connection
, test_mgmt_to_vlan_connection
, test_vlan_to_mgmt_connection
, test_delete_vlan_from_multiple
: VM network created but route info not available
)
test_vlan_network_connection
, test_mgmt_to_vlan_connection
, test_vlan_to_mgmt_connection
, test_delete_vlan_from_multiple
: VM network created but route info not available
)
:green_circle: _test_zterraform.py
Close as PRs already merged.
What's the test to develop? Please describe
Our test is set to run with rke1 & rke2 versions as 1.24 and the default value of image for guest cluster is 15.4. We need to update them.
openSUSE image - 15.5 K8s version - 1.27