siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.39k stars 514 forks source link

UBSAN reporting an out-of-bounds array access. #8780

Closed dedene closed 1 month ago

dedene commented 3 months ago

Bug Report

Description

After upgrading from Talos v1.6.7 to v.1.7.x a kernel error is logged after booting: UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:194:14. The node seems perfectly functioning however, but this message is not appearing on v1.6.7 or lower. I'm running Cilium 1.15.5 as CNI.

I'm not sure if this is a Cilium issue or Talos Linux related issue, Or even upstream due to upgrading Linux from 6.1.x to 6.6.x. But I'm hoping for some help or advice from both communities. Feel free to close this issue if not applicable.

Many thanks in advance!

Logs

kern:     err: [2024-05-22T12:30:28.7931256Z]: ================================================================================
kern:     err: [2024-05-22T12:30:28.7934956Z]: UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:194:14
kern:     err: [2024-05-22T12:30:28.7937366Z]: index 8 is out of range for type '__u8 [*]'
kern: warning: [2024-05-22T12:30:28.7939176Z]: CPU: 0 PID: 2591 Comm: cilium-agent Not tainted 6.6.30-talos #1
kern: warning: [2024-05-22T12:30:28.7941556Z]: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 3.20230228-4 06/06/2023
kern: warning: [2024-05-22T12:30:28.7944506Z]: Call Trace:
kern: warning: [2024-05-22T12:30:28.7945456Z]:  <TASK>
kern: warning: [2024-05-22T12:30:28.7946546Z]:  dump_stack_lvl+0x47/0x60
kern: warning: [2024-05-22T12:30:28.7948526Z]:  __ubsan_handle_out_of_bounds+0xc7/0x100
kern: warning: [2024-05-22T12:30:28.7950876Z]:  longest_prefix_match.isra.0+0x18c/0x1d0
kern: warning: [2024-05-22T12:30:28.7952736Z]:  trie_update_elem+0x152/0x300
kern: warning: [2024-05-22T12:30:28.7954186Z]:  bpf_map_update_value+0xe5/0x220
kern: warning: [2024-05-22T12:30:28.7955786Z]:  __sys_bpf+0x19e2/0x26d0
kern: warning: [2024-05-22T12:30:28.7957116Z]:  __x64_sys_bpf+0x1e/0x30
kern: warning: [2024-05-22T12:30:28.7958426Z]:  do_syscall_64+0x5a/0x80
kern: warning: [2024-05-22T12:30:28.7959776Z]:  entry_SYSCALL_64_after_hwframe+0x78/0xe2
kern: warning: [2024-05-22T12:30:28.7959806Z]: RIP: 0033:0x40720e
kern: warning: [2024-05-22T12:30:28.7959826Z]: Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
kern: warning: [2024-05-22T12:30:28.7959846Z]: RSP: 002b:000000c000dec1b0 EFLAGS: 00000216 ORIG_RAX: 0000000000000141
kern: warning: [2024-05-22T12:30:28.7959866Z]: RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 000000000040720e
kern: warning: [2024-05-22T12:30:28.7959876Z]: RDX: 0000000000000020 RSI: 000000c000dec360 RDI: 0000000000000002
kern: warning: [2024-05-22T12:30:28.7959886Z]: RBP: 000000c000dec1f0 R08: 0000000000000000 R09: 0000000000000000
kern: warning: [2024-05-22T12:30:28.7959886Z]: R10: 0000000000000000 R11: 0000000000000216 R12: 000000c000debef0
kern: warning: [2024-05-22T12:30:28.7959896Z]: R13: 000000c000be5c00 R14: 000000c000cae680 R15: 00000000000001e9
kern: warning: [2024-05-22T12:30:28.7959926Z]:  </TASK>
kern:     err: [2024-05-22T12:30:28.7959936Z]: ================================================================================

Environment

smira commented 3 months ago

I'd guess it's Cilium/BPF issue (might go deeper into Linux kernel), but not something we can fix exactly on Talos side.

dedene commented 3 months ago

Thanks! I'm also opening a Github issue on Cilium. For now I'll stick to Talos 1.6.7. I'll update here when there is more information.

nberlee commented 3 months ago

No its an upstream Kernel issue.

See patchwork and mailinglist

I have applied it in my custom talos kernel and this fixes it. So I guess in order to fix this in 1.7 and newer versions is to disable UBSAN as a kernel parameter. Or compile your own kernel

smira commented 3 months ago

Nice find, I hope it will get backported to 6.6.x

mrclrchtr commented 3 weeks ago

With Talos 1.7.6 and Linux6.6.43 this should be fixed, right? This doesn't seem to be the case for me at the moment:

 kern:     err: [2024-08-12T15:18:50.641243422Z]: ================================================================================
 kern:     err: [2024-08-12T15:18:50.641247422Z]: UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:194:14
 kern:     err: [2024-08-12T15:18:50.641252422Z]: index 8 is out of range for type '__u8 [*]'
 kern: warning: [2024-08-12T15:18:50.641255422Z]: CPU: 0 PID: 5240 Comm: cilium-agent Not tainted 6.6.43-talos #1
 kern: warning: [2024-08-12T15:18:50.641259422Z]: Hardware name: Hetzner vServer/KVM Virtual Machine, BIOS 20171111 11/11/2017
 kern: warning: [2024-08-12T15:18:50.641261422Z]: Call trace:
 kern: warning: [2024-08-12T15:18:50.641262422Z]:  dump_backtrace+0x9c/0x100
 kern: warning: [2024-08-12T15:18:50.641269422Z]:  show_stack+0x34/0x50
 kern: warning: [2024-08-12T15:18:50.641272422Z]:  dump_stack_lvl+0x78/0xd0
 kern: warning: [2024-08-12T15:18:50.641278422Z]:  dump_stack+0x1c/0x30
 kern: warning: [2024-08-12T15:18:50.641281422Z]:  __ubsan_handle_out_of_bounds+0xc0/0x100
 kern: warning: [2024-08-12T15:18:50.641286422Z]:  longest_prefix_match.isra.0+0x200/0x258
 kern: warning: [2024-08-12T15:18:50.641290422Z]:  trie_update_elem+0x160/0x3a0
 kern: warning: [2024-08-12T15:18:50.641292422Z]:  bpf_map_update_value+0xcc/0x2c8
 kern: warning: [2024-08-12T15:18:50.641295422Z]:  map_update_elem+0x19c/0x328
 kern: warning: [2024-08-12T15:18:50.641299422Z]:  __sys_bpf+0x834/0x1c48
 kern: warning: [2024-08-12T15:18:50.641303422Z]:  __arm64_sys_bpf+0x34/0x58
 kern: warning: [2024-08-12T15:18:50.641307422Z]:  invoke_syscall+0x90/0x128
 kern: warning: [2024-08-12T15:18:50.641309422Z]:  el0_svc_common.constprop.0+0xec/0x118
 kern: warning: [2024-08-12T15:18:50.641311422Z]:  do_el0_svc+0x34/0x50
 kern: warning: [2024-08-12T15:18:50.641313422Z]:  el0_svc+0x4c/0x178
 kern: warning: [2024-08-12T15:18:50.641315422Z]:  el0t_64_sync_handler+0x128/0x138
 kern: warning: [2024-08-12T15:18:50.641317422Z]:  el0t_64_sync+0x1bc/0x1c0
 kern:     err: [2024-08-12T15:18:50.641320422Z]: ================================================================================

Whereby the trace seems to be somewhat different.

nberlee commented 3 weeks ago

It was merged to pkg:main, and not backported (yet) to pkg:release-1.7 branch. That means it will be in Talos 1.8.