stormshift / support

This repo should serve as a central source for reporting issues with stormshift
GNU General Public License v3.0
3 stars 0 forks source link

VM NetworkBook fails on some nodes: inf7 inf8 or inf44 #208

Open rbo opened 1 month ago

rbo commented 1 month ago

on inf44: image

on ucs56

image

rbo commented 1 month ago

/cc @DanielFroehlich

Moving the VM (MAC: 0e:c0:ef:20:63:10 in ucs57, and watch the network traffic:

$ oc debug node/ucs57
sh-5.1# tcpdump -i coe-bridge -n port 67 and port 68
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on coe-bridge, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:01:53.170785 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from f8:f2:1e:db:6c:f0, length 281
15:01:54.699247 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 6c:fe:54:4b:12:59, length 281
15:01:54.993995 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0e:c0:ef:20:63:10, length 347
15:01:54.995331 IP 10.32.96.1.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 326
15:01:58.063646 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0e:c0:ef:20:63:10, length 359
15:01:58.064036 IP 10.32.96.1.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 326
15:01:58.064165 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0e:c0:ef:20:63:10, length 291
15:02:03.104924 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0e:c0:ef:20:63:10, length 340
15:02:03.105368 IP 10.32.96.1.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 338
15:02:03.814602 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from e8:eb:d3:08:d1:49, length 281
15:02:06.408511 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:25:b5:00:00:06, length 296
15:02:07.084158 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 0e:c0:ef:20:63:10, length 352
15:02:07.084529 IP 10.32.96.1.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 338
^C
13 packets captured
13 packets received by filter
0 packets dropped by kernel
sh-5.1# exit
exit

Running VM on inf44:

oc debug node/inf44
Starting pod/inf44-debug-q7pz8 ...
To use host binaries, run `chroot /host`
Pod IP: 10.32.96.44
If you don't see a command prompt, try pressing enter.
sh-5.1#
sh-5.1#
sh-5.1#
sh-5.1# tcpdump -i coe-bridge -n port 67 and port 68
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on coe-bridge, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:04:55.635980 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:06:8e:2e, length 284
15:04:56.811352 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
15:05:07.476090 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from f8:f2:1e:db:6c:f0, length 281
15:05:08.805167 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 6c:fe:54:4b:12:59, length 281
15:05:17.792533 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from e8:eb:d3:08:d1:49, length 281
15:05:20.355751 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:25:b5:00:00:06, length 296

15:05:35.620308 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from e8:eb:d3:08:d1:48, length 281

15:06:00.494174 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:06:8e:2e, length 284
15:06:01.613988 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
15:06:11.971938 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from f8:f2:1e:db:6c:f0, length 281
15:06:13.419260 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 6c:fe:54:4b:12:59, length 281
15:06:22.662867 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from e8:eb:d3:08:d1:49, length 281
15:06:25.328194 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:25:b5:00:00:06, length 296
15:06:39.813851 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from e8:eb:d3:08:d1:48, length 281
15:06:48.913960 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
15:06:51.244706 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
15:06:56.029003 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
15:07:04.687854 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
15:07:05.036995 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:06:8e:2e, length 284
15:07:05.839964 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:07:ff:ff, length 284
^C
20 packets captured
20 packets received by filter
0 packets dropped by kernel
sh-5.1#

There are not DHCP requests visible

rbo commented 1 month ago

comparing virt pods:

diff -Nuar /tmp/inf44.yaml /tmp/ucs57.yaml
--- /tmp/inf44.yaml 2024-10-16 17:08:10.395589448 +0200
+++ /tmp/ucs57.yaml 2024-10-16 17:09:50.658052559 +0200
@@ -2,15 +2,15 @@
 kind: Pod
 metadata:
   annotations:
-    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.128.8.214/21"],"mac_address":"0a:58:0a:80:08:d6","gateway_ips":["10.128.8.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.128.8.1"},{"dest":"172.30.0.0/16","nextHop":"10.128.8.1"},{"dest":"100.64.0.0/16","nextHop":"10.128.8.1"}],"ip_address":"10.128.8.214/21","gateway_ip":"10.128.8.1"}}'
+    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.130.11.54/21"],"mac_address":"0a:58:0a:82:0b:36","gateway_ips":["10.130.8.1"],"routes":[{"dest":"10.128.0.0/14","nextHop":"10.130.8.1"},{"dest":"172.30.0.0/16","nextHop":"10.130.8.1"},{"dest":"100.64.0.0/16","nextHop":"10.130.8.1"}],"ip_address":"10.130.11.54/21","gateway_ip":"10.130.8.1"}}'
     k8s.v1.cni.cncf.io/network-status: |-
       [{
           "name": "ovn-kubernetes",
           "interface": "eth0",
           "ips": [
-              "10.128.8.214"
+              "10.130.11.54"
           ],
-          "mac": "0a:58:0a:80:08:d6",
+          "mac": "0a:58:0a:82:0b:36",
           "default": true,
           "dns": {}
       },{
@@ -23,7 +23,7 @@
     kubectl.kubernetes.io/default-container: compute
     kubevirt.io/domain: ushift16-ostree
     kubevirt.io/migrationTransportUnix: "true"
-    kubevirt.io/vm-generation: "16"
+    kubevirt.io/vm-generation: "19"
     openshift.io/scc: kubevirt-controller
     post.hook.backup.velero.io/command: '["/usr/bin/virt-freezer", "--unfreeze", "--name",
       "ushift16-ostree", "--namespace", "rbohne-debug"]'
@@ -32,15 +32,15 @@
       "ushift16-ostree", "--namespace", "rbohne-debug"]'
     pre.hook.backup.velero.io/container: compute
     seccomp.security.alpha.kubernetes.io/pod: localhost/kubevirt/kubevirt.json
-  creationTimestamp: "2024-10-16T15:05:29Z"
+  creationTimestamp: "2024-10-16T15:09:03Z"
   generateName: virt-launcher-ushift16-ostree-
   labels:
     kubevirt.io: virt-launcher
-    kubevirt.io/created-by: 0fb45324-f59e-4a89-9335-60e9d45ef927
-    kubevirt.io/nodeName: inf44
+    kubevirt.io/created-by: 33951148-4769-4319-bef3-3ccb3e472032
+    kubevirt.io/nodeName: ucs57
     vm.kubevirt.io/name: ushift16-ostree
     vm_group: cluster_ushift16_ostree
-  name: virt-launcher-ushift16-ostree-wd9nv
+  name: virt-launcher-ushift16-ostree-wsgb7
   namespace: rbohne-debug
   ownerReferences:
   - apiVersion: kubevirt.io/v1
@@ -48,9 +48,9 @@
     controller: true
     kind: VirtualMachineInstance
     name: ushift16-ostree
-    uid: 0fb45324-f59e-4a89-9335-60e9d45ef927
-  resourceVersion: "1638035362"
-  uid: 54bc701b-f3aa-43c4-af8d-e917a8ed47f2
+    uid: 33951148-4769-4319-bef3-3ccb3e472032
+  resourceVersion: "1638049339"
+  uid: be3ea479-6a84-4e29-82c6-178810925c56
 spec:
   affinity:
     nodeAffinity:
@@ -64,11 +64,11 @@
   - command:
     - /usr/bin/virt-launcher-monitor
     - --qemu-timeout
-    - 332s
+    - 266s
     - --name
     - ushift16-ostree
     - --uid
-    - 0fb45324-f59e-4a89-9335-60e9d45ef927
+    - 33951148-4769-4319-bef3-3ccb3e472032
     - --namespace
     - rbohne-debug
     - --kubevirt-share-dir
@@ -166,10 +166,10 @@
   hostname: ushift16-ostree
   imagePullSecrets:
   - name: default-dockercfg-r96q8
-  nodeName: inf44
+  nodeName: ucs57
   nodeSelector:
     kubernetes.io/arch: amd64
-    kubernetes.io/hostname: inf44
+    kubernetes.io/hostname: ucs57
     kubevirt.io/schedulable: "true"
   preemptionPolicy: PreemptLowerPriority
   priority: 0
@@ -229,34 +229,34 @@
     name: hotplug-disks
 status:
   conditions:
-  - lastProbeTime: "2024-10-16T15:05:29Z"
-    lastTransitionTime: "2024-10-16T15:05:29Z"
+  - lastProbeTime: "2024-10-16T15:09:03Z"
+    lastTransitionTime: "2024-10-16T15:09:03Z"
     message: the virtual machine is not paused
     reason: NotPaused
     status: "True"
     type: kubevirt.io/virtual-machine-unpaused
   - lastProbeTime: null
-    lastTransitionTime: "2024-10-16T15:05:37Z"
+    lastTransitionTime: "2024-10-16T15:09:10Z"
     status: "True"
     type: PodReadyToStartContainers
   - lastProbeTime: null
-    lastTransitionTime: "2024-10-16T15:05:29Z"
+    lastTransitionTime: "2024-10-16T15:09:03Z"
     status: "True"
     type: Initialized
   - lastProbeTime: null
-    lastTransitionTime: "2024-10-16T15:05:37Z"
+    lastTransitionTime: "2024-10-16T15:09:10Z"
     status: "True"
     type: Ready
   - lastProbeTime: null
-    lastTransitionTime: "2024-10-16T15:05:37Z"
+    lastTransitionTime: "2024-10-16T15:09:10Z"
     status: "True"
     type: ContainersReady
   - lastProbeTime: null
-    lastTransitionTime: "2024-10-16T15:05:29Z"
+    lastTransitionTime: "2024-10-16T15:09:03Z"
     status: "True"
     type: PodScheduled
   containerStatuses:
-  - containerID: cri-o://7c0f2e9be660bd79a066b3a1285ebb8a701d3abb57cab5150c954a5579390606
+  - containerID: cri-o://303e88cb684c686c47ae110003b3058c2cf054c13cb063119a6b14a2b052d939
     image: registry.redhat.io/container-native-virtualization/virt-launcher-rhel9@sha256:444191284ff0adb7e38d4786a037a0c39a340cfea6b3a943951c8a3dc79dacb2
     imageID: registry.redhat.io/container-native-virtualization/virt-launcher-rhel9@sha256:2961c32db99ee3af67c299417207b4f714d3dd007f3b02e1443d36839b375bec
     lastState: {}
@@ -266,13 +266,13 @@
     started: true
     state:
       running:
-        startedAt: "2024-10-16T15:05:36Z"
-  hostIP: 10.32.96.44
+        startedAt: "2024-10-16T15:09:09Z"
+  hostIP: 10.32.96.57
   hostIPs:
-  - ip: 10.32.96.44
+  - ip: 10.32.96.57
   phase: Running
-  podIP: 10.128.8.214
+  podIP: 10.130.11.54
   podIPs:
-  - ip: 10.128.8.214
+  - ip: 10.130.11.54
   qosClass: Burstable
-  startTime: "2024-10-16T15:05:29Z"
+  startTime: "2024-10-16T15:09:03Z"
DanielFroehlich commented 1 month ago

@rbo right - that fits my obersvation that there are no log entries on the DHCP server either. The fact that there is no PXE/HTTP boot options in the BIOS visible lets me think that the VM feels like it has no NIC, or the NIC is is not connected to a network.

rbo commented 1 month ago

Nic is available I checked in the bios settings: (running on inf44)

Screenshot 2024-10-16 at 17 07 00

I was not able to jump into the bios settings when it's running on ucs57