Closed samip5 closed 1 month ago
We need kernel logs to understand what is going on, which node type it was, etc.
Omni requires SideroLink connection to controlplanes, not workers.
Omni requires SideroLink connection to controlplanes, not workers.
Could you please clarify? If i'm running a single node cluster, and that said node is not able to connect to it due to epbf screwing up networking somehow? Should I be able to gather kernel logs even if siderolink is utlized, without going though Omni aka directly to the Talos API?
If you run a single node cluster, it's a controlplane and a worker, so SideroLink connection is required for Omni.
There's no way to bypass Omni for Talos API access atm, see #191.
Okay, so I was able to figure out the problem ish, Raspberry Pi 4 loses IPv6 after (and thus loses siderolink as well):
192.168.2.223: kern: err: [2024-05-31T14:23:53.132888381Z]: ================================================================================
192.168.2.223: kern: err: [2024-05-31T14:23:53.142751381Z]: UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:194:14
192.168.2.223: kern: err: [2024-05-31T14:23:53.151197381Z]: index 8 is out of range for type '__u8 [*]'
192.168.2.223: kern: warning: [2024-05-31T14:23:53.157042381Z]: CPU: 3 PID: 5785 Comm: cilium-agent Not tainted 6.6.32-talos #1
192.168.2.223: kern: warning: [2024-05-31T14:23:53.165203381Z]: Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2024.01 01/01/2024
192.168.2.223: kern: warning: [2024-05-31T14:23:53.174996381Z]: Call trace:
192.168.2.223: kern: warning: [2024-05-31T14:23:53.178181381Z]: dump_backtrace+0x9c/0x100
192.168.2.223: kern: warning: [2024-05-31T14:23:53.183065381Z]: show_stack+0x34/0x50
192.168.2.223: kern: warning: [2024-05-31T14:23:53.187271381Z]: dump_stack_lvl+0x78/0xd0
192.168.2.223: kern: warning: [2024-05-31T14:23:53.191805381Z]: dump_stack+0x1c/0x30
192.168.2.223: kern: warning: [2024-05-31T14:23:53.195914381Z]: __ubsan_handle_out_of_bounds+0xc0/0x100
192.168.2.223: kern: warning: [2024-05-31T14:23:53.201606381Z]: longest_prefix_match.isra.0+0x200/0x258
192.168.2.223: kern: warning: [2024-05-31T14:23:53.207390381Z]: trie_update_elem+0x160/0x3a0
192.168.2.223: kern: warning: [2024-05-31T14:23:53.212257381Z]: bpf_map_update_value+0xcc/0x2c8
192.168.2.223: kern: warning: [2024-05-31T14:23:53.217167381Z]: map_update_elem+0x19c/0x328
192.168.2.223: kern: warning: [2024-05-31T14:23:53.221572381Z]: __sys_bpf+0x834/0x1bf0
192.168.2.223: kern: warning: [2024-05-31T14:23:53.225522381Z]: __arm64_sys_bpf+0x34/0x58
192.168.2.223: kern: warning: [2024-05-31T14:23:53.229709381Z]: invoke_syscall+0x90/0x128
192.168.2.223: kern: warning: [2024-05-31T14:23:53.233875381Z]: el0_svc_common.constprop.0+0xec/0x118
192.168.2.223: kern: warning: [2024-05-31T14:23:53.239080381Z]: do_el0_svc+0x34/0x50
192.168.2.223: kern: warning: [2024-05-31T14:23:53.242792381Z]: el0_svc+0x4c/0x178
192.168.2.223: kern: warning: [2024-05-31T14:23:53.246316381Z]: el0t_64_sync_handler+0x128/0x138
192.168.2.223: kern: warning: [2024-05-31T14:23:53.251050381Z]: el0t_64_sync+0x1bc/0x1c0
192.168.2.223: kern: err: [2024-05-31T14:23:53.255074381Z]: ================================================================================
192.168.2.223: kern: err: [2024-05-31T14:23:53.264247381Z]: ================================================================================
192.168.2.223: kern: err: [2024-05-31T14:23:53.273410381Z]: UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:194:14
192.168.2.223: kern: err: [2024-05-31T14:23:53.281232381Z]: index 8 is out of range for type '__u8 [*]'
192.168.2.223: kern: warning: [2024-05-31T14:23:53.286866381Z]: CPU: 3 PID: 5785 Comm: cilium-agent Not tainted 6.6.32-talos #1
192.168.2.223: kern: warning: [2024-05-31T14:23:53.294678381Z]: Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2024.01 01/01/2024
192.168.2.223: kern: warning: [2024-05-31T14:23:53.303935381Z]: Call trace:
192.168.2.223: kern: warning: [2024-05-31T14:23:53.306827381Z]: dump_backtrace+0x9c/0x100
192.168.2.223: kern: warning: [2024-05-31T14:23:53.311028381Z]: show_stack+0x34/0x50
192.168.2.223: kern: warning: [2024-05-31T14:23:53.314788381Z]: dump_stack_lvl+0x78/0xd0
192.168.2.223: kern: warning: [2024-05-31T14:23:53.318896381Z]: dump_stack+0x1c/0x30
192.168.2.223: kern: warning: [2024-05-31T14:23:53.322651381Z]: __ubsan_handle_out_of_bounds+0xc0/0x100
192.168.2.223: kern: warning: [2024-05-31T14:23:53.328070381Z]: longest_prefix_match.isra.0+0x218/0x258
192.168.2.223: kern: warning: [2024-05-31T14:23:53.333481381Z]: trie_update_elem+0x160/0x3a0
192.168.2.223: kern: warning: [2024-05-31T14:23:53.337939381Z]: bpf_map_update_value+0xcc/0x2c8
192.168.2.223: kern: warning: [2024-05-31T14:23:53.342667381Z]: map_update_elem+0x19c/0x328
192.168.2.223: kern: warning: [2024-05-31T14:23:53.347052381Z]: __sys_bpf+0x834/0x1bf0
192.168.2.223: kern: warning: [2024-05-31T14:23:53.350996381Z]: __arm64_sys_bpf+0x34/0x58
192.168.2.223: kern: warning: [2024-05-31T14:23:53.355186381Z]: invoke_syscall+0x90/0x128
192.168.2.223: kern: warning: [2024-05-31T14:23:53.359361381Z]: el0_svc_common.constprop.0+0xec/0x118
192.168.2.223: kern: warning: [2024-05-31T14:23:53.364574381Z]: do_el0_svc+0x34/0x50
192.168.2.223: kern: warning: [2024-05-31T14:23:53.368295381Z]: el0_svc+0x4c/0x178
192.168.2.223: kern: warning: [2024-05-31T14:23:53.371829381Z]: el0t_64_sync_handler+0x128/0x138
192.168.2.223: kern: warning: [2024-05-31T14:23:53.376571381Z]: el0t_64_sync+0x1bc/0x1c0
192.168.2.223: kern: err: [2024-05-31T14:23:53.380605381Z]: ================================================================================
192.168.2.223: user: warning: [2024-05-31T14:23:53.940437381Z]: [talos] machine is running and ready {"component": "controller-runtime", "controller": "runtime.MachineStatusController"}
192.168.2.223: user: warning: [2024-05-31T14:24:13.941928381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:24:36.184696381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:25:00.080260381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:25:10.231736381Z]: [talos] error watching discovery service state {"component": "controller-runtime", "controller": "cluster.DiscoveryServiceController", "error": "rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout"}
192.168.2.223: user: warning: [2024-05-31T14:25:23.955645381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:25:50.988570381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:26:20.607494381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:26:58.718111381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:27:00.588193381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:27:00.621848381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:27:03.723223381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:27:07.251348381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:27:10.962202381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:27:17.297776381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:27:34.179597381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:27:49.019389381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:27:51.737338381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:28:05.198873381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:28:31.234945381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:28:43.204110381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:29:19.870160381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:30:09.755798381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:30:28.093438381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:31:10.691467381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:31:25.357663381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:31:42.428429381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:32:37.466072381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:32:44.430319381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:33:41.451463381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:34:10.896618381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:34:48.738643381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:35:42.387931381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "v1alpha1.EventsSinkController", "error": "error publishing event: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [fdae:41e4:649b:9303::1]:8091: i/o timeout\""}
192.168.2.223: user: warning: [2024-05-31T14:35:43.519770381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
192.168.2.223: user: warning: [2024-05-31T14:36:44.718638381Z]: [talos] controller failed {"component": "controller-runtime", "controller": "siderolink.ManagerController", "error": "error provisioning: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [2a01:4f9:c012:559::1]:8090: connect: network is unreachable\""}
See https://github.com/siderolabs/talos/issues/8780, this is a Linux kernel/eBPF issue.
See https://github.com/siderolabs/talos/issues/8780, this is a Linux kernel/eBPF issue.
But it's quite confusing as the same IPv6 breaking doesn't seem to happen on amd64 despite it printing similar things. :/
Closing this as it's not relevant to Omni per se.
Is there an existing issue for this?
Current Behavior
I'm experiencing an problem with networking, so that when I install my CNI that uses epbf, I seem to get an kernel error of some sort in logs, but it is NOT pushed to Omni as it doesn't seem to be able to connect to it anymore and I'm left with no way to connect to gather logs.
Expected Behavior
I expected there to be a way for me to gather logs even when machines are unable to connect to Omni.
Steps To Reproduce
What browsers are you seeing the problem on?
No response
Anything else?
Related to https://github.com/cilium/cilium/issues/32812