Open Szpadel opened 1 year ago
I caught this time another crash message on node with io-engine (this might not be io-engine crashing, but whole node and it's brings all other with bug in nvme_tcp)
[ 188.144797] usercopy: Kernel memory exposure attempt detected from page alloc (offset 0, size 24576)!
[ 188.145098] ------------[ cut here ]------------
[ 188.145102] kernel BUG at mm/usercopy.c:101!
[ 188.145205] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 188.145271] CPU: 2 PID: 10570 Comm: lcore-worker-2 Not tainted 6.1.14-200.fc37.x86_64 #1
[ 188.145346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 188.145424] RIP: 0010:usercopy_abort+0x75/0x77
[ 188.145511] Code: 6f a8 51 48 0f 45 d6 48 89 c1 49 c7 c3 b8 80 71 a8 41 52 48 c7 c6 84 87 6f a8 48 c7 c7 58 80 71 a8 49 0f 45 f3 e8 2f 51 ff ff <0f> 0b 48 89 f1 49 89 e8 44 89 e2 31 f6 48 c7 c7 02 81 71 a8 e8 72
[ 188.145662] RSP: 0018:ffffb23d4ba8fa18 EFLAGS: 00010286
[ 188.145718] RAX: 0000000000000059 RBX: ffff8e67058b8000 RCX: 0000000000000000
[ 188.145778] RDX: 0000000000000001 RSI: ffffffffa8749b33 RDI: 00000000ffffffff
[ 188.145844] RBP: 0000000000006000 R08: 0000000000000000 R09: ffffb23d4ba8f8c8
[ 188.145902] R10: 0000000000000003 R11: ffffffffa9147448 R12: 0000000000000001
[ 188.145946] R13: ffff8e67058be000 R14: 0000000000000018 R15: ffff8e66448c4ed0
[ 188.145991] FS: 00007f9ab0f30640(0000) GS:ffff8e6e8f880000(0000) knlGS:0000000000000000
[ 188.146062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 188.146122] CR2: 00005635efc7a008 CR3: 00000003647f0004 CR4: 0000000000170ee0
[ 188.146196] Call Trace:
[ 188.146247] <TASK>
[ 188.146277] __check_object_size.cold+0x17/0xcb
[ 188.146333] simple_copy_to_iter+0x25/0x40
[ 188.146373] __skb_datagram_iter+0x19e/0x2f0
[ 188.146402] ? skb_free_datagram+0x10/0x10
[ 188.146449] skb_copy_datagram_iter+0x30/0x90
[ 188.146506] ? avc_has_perm+0xa7/0x190
[ 188.146563] tcp_recvmsg_locked+0x254/0x8f0
[ 188.146610] tcp_recvmsg+0x75/0x1d0
[ 188.146633] inet_recvmsg+0x42/0x100
[ 188.146660] ? sock_recvmsg+0x1c/0x70
[ 188.146682] sock_read_iter+0x84/0xd0
[ 188.146699] do_iter_readv_writev+0x112/0x130
[ 188.146719] do_iter_read+0xe8/0x1e0
[ 188.146735] vfs_readv+0x95/0xc0
[ 188.146750] do_readv+0xd2/0x130
[ 188.146765] do_syscall_64+0x58/0x80
[ 188.146784] ? do_readv+0xef/0x130
[ 188.146797] ? syscall_exit_to_user_mode+0x17/0x40
[ 188.146814] ? do_syscall_64+0x67/0x80
[ 188.146831] ? do_syscall_64+0x67/0x80
[ 188.146844] ? do_syscall_64+0x67/0x80
[ 188.146861] ? do_syscall_64+0x67/0x80
[ 188.146877] ? do_syscall_64+0x67/0x80
[ 188.146891] ? do_syscall_64+0x67/0x80
[ 188.146906] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 188.146926] RIP: 0033:0x7f9ab1d38b77
[ 188.146964] Code: 1f 40 00 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 bb fc f8 ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 13 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 14 fd f8 ff 48
[ 188.148093] RSP: 002b:00007f9ab0f2dd50 EFLAGS: 00000293 ORIG_RAX: 0000000000000013
[ 188.148638] RAX: ffffffffffffffda RBX: 0000000000000224 RCX: 00007f9ab1d38b77
[ 188.149151] RDX: 0000000000000002 RSI: 00007f9ab0f2dd80 RDI: 0000000000000224
[ 188.149685] RBP: 00007f9ab0f2dd80 R08: 0000000000000000 R09: 000020003d4183c0
[ 188.150120] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000002
[ 188.150418] R13: 00007f9ab0f2dde0 R14: 00007f9ab0f2dd80 R15: 0000000000008240
[ 188.150943] </TASK>
[ 188.151409] Modules linked in: xt_multiport xt_set ipt_rpfilter ip_set_hash_ip ip_set_hash_net ipip tunnel4 ip_tunnel bpf_preload rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel veth nf_conntrack_netlink xt_addrtype xt_statistic xt_nat ipt_REJECT ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_MASQUERADE xt_mark xt_conntrack xt_comment nft_compat overlay nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink sunrpc vfat fat intel_rapl_msr intel_rapl_common kvm_intel xfs kvm irqbypass rapl i2c_piix4 joydev tcp_bbr sch_fq nvme_tcp nvme_fabrics nvme_core nvme_common loop zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic bochs virtio_net drm_vram_helper drm_ttm_helper ghash_clmulni_intel net_failover ttm sha512_ssse3
[ 188.151590] virtio_console failover virtio_scsi serio_raw ata_generic pata_acpi fuse qemu_fw_cfg
[ 188.154797] ---[ end trace 0000000000000000 ]---
[ 188.155218] RIP: 0010:usercopy_abort+0x75/0x77
[ 188.155566] Code: 6f a8 51 48 0f 45 d6 48 89 c1 49 c7 c3 b8 80 71 a8 41 52 48 c7 c6 84 87 6f a8 48 c7 c7 58 80 71 a8 49 0f 45 f3 e8 2f 51 ff ff <0f> 0b 48 89 f1 49 89 e8 44 89 e2 31 f6 48 c7 c7 02 81 71 a8 e8 72
[ 188.156309] RSP: 0018:ffffb23d4ba8fa18 EFLAGS: 00010286
[ 188.156693] RAX: 0000000000000059 RBX: ffff8e67058b8000 RCX: 0000000000000000
[ 188.157058] RDX: 0000000000000001 RSI: ffffffffa8749b33 RDI: 00000000ffffffff
[ 188.157508] RBP: 0000000000006000 R08: 0000000000000000 R09: ffffb23d4ba8f8c8
[ 188.158090] R10: 0000000000000003 R11: ffffffffa9147448 R12: 0000000000000001
[ 188.158881] R13: ffff8e67058be000 R14: 0000000000000018 R15: ffff8e66448c4ed0
[ 188.159290] FS: 00007f9ab0f30640(0000) GS:ffff8e6e8f880000(0000) knlGS:0000000000000000
[ 188.159648] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 188.160020] CR2: 00005635efc7a008 CR3: 00000003647f0004 CR4: 0000000000170ee0
This crash is in lcore-worker-2
and if i'm understanding correctly this is part of io-engine
Same with kernel 6.1.15
~usercopy issue seems to be not triggered on 6.0.7~ It just triggered
Hi @Szpadel, don't think we've ever seen this but also don't think we've tested kernel v6, maybe @blaisedias can confirm. @Szpadel would you be able to test with a v5 kernel?
@tiagolobocastro yes, we haven't tested with v6 kernels.
@tiagolobocastro it's now running on 5.15.97 as I have no idea how to trigger this usercopy bug, I'll report if it triggers
Looks like usercopy bug does not happen on 5.15.97
Thanks @Szpadel for reporting this, I've observed the same kernel panic on talos 1.3.5 with kernel 5.15.94 just now.
@laibe could you please share how you were able to hit this on v5? @blaisedias what specific kernel versions do we test with?
Unfortunately I did not have kernel log delivery set up so I only see the last couple of lines via IPMI (see below) since talos afaik does not keep dmesg logs from previous boots. Essentially the whole node crashed and I had to do a hard reset.
@laibe in my case logs was always cut before panic happened and I was catching them by keeping ssh session with dmesg -w
In my case it happened during the night and I was not expecting it to happen otherwise I would have logged dmesg. I'd like to do some further testing with the latest talos release (kernel 5.15.102)
Given that you've now also reproduced this on v5, I also note that we don't really test on talos, mostly just ubuntu AFAIK. Maybe the bug is simply more easily reproducible on talos? Anyway, someone from our team is getting up to speed with talos, so hopefully we should be able to do some testing of our own, and try to reproduce these :crossed_fingers:
Bare metal cluster with talos are relatively easy to set up so it might even turn out as a convenient way for your test setups :slightly_smiling_face:. I don't have a bare metal cluster to play around with at the moment otherwise I could assist with further debugging.
@datacore-tilangovan maybe @laibe could help with any issues you may find, if he's willing and able ofcourse, no pressure :)
I've pinged @datacore-bolt-ci in the Mayastor discord, as mentioned unfortunately I don't have a bare metal cluster available for debugging at the moment but happy to help in regards to the set up of talos and mayastor on talos.
@blaisedias can we run tests on kernel v6?
On Talos 1.6.7 - Kernel 6.1.82 - https://github.com/siderolabs/talos/releases/tag/v1.6.7
Every time I declare a DiskPool I get a Kernel Panic
10.0.0.125: kern: err: [2024-03-24T16:39:46.213509249Z]: rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
10.0.0.125: kern: err: [2024-03-24T16:39:46.214733249Z]: rcu: 1-...0: (1 GPs behind) idle=bb1c/1/0x4000000000000000 softirq=11407/11411 fqs=10527
10.0.0.125: kern: warning: [2024-03-24T16:39:46.216561249Z]: (detected by 2, t=52518 jiffies, g=16029, q=263999 ncpus=16)
10.0.0.125: kern: info: [2024-03-24T16:39:46.217919249Z]: Sending NMI from CPU 2 to CPUs 1:
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218811249Z]: NMI backtrace for cpu 1
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218813249Z]: CPU: 1 PID: 5624 Comm: io-engine Not tainted 6.1.82-talos #1
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218819249Z]: Hardware name: Intel(R) Client Systems NUC12WSHi5/NUC12WSBi5, BIOS WSADL357.0085.2022.0718.1739 07/18/2022
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218820249Z]: RIP: 0010:swiotlb_tbl_map_single+0x25c/0x660
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218828249Z]: Code: ed 75 75 4d 85 f6 75 02 0f 0b 4d 8d 56 ff 4c 89 f1 4c 89 64 24 20 4c 21 d1 48 89 4c 24 30 4c 8b 5c 24 30 4d 85 db 75 e0 89 e9 <4c> 8b 64 24 08 49 8d 14 09 4c 21 d2 4c 01 e2 49 39 d6 72 1e 48 8d
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218829249Z]: RSP: 0018:ffff9d95c0a03878 EFLAGS: 00000046
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218832249Z]: RAX: 00000000000006e1 RBX: 0000000000000004 RCX: 0000000000000ee1
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218833249Z]: RDX: 000000022a51d600 RSI: 000000000000004c RDI: 0000000000000800
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218833249Z]: RBP: 0000000000000ee1 R08: 0000000000000800 R09: 000000000007ed3a
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218834249Z]: R10: 00000000001fffff R11: 0000000000000000 R12: 000000003f69d000
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218834249Z]: R13: 0000000215b10e00 R14: 0000000000200000 R15: ffffffffb56a1f00
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218835249Z]: FS: 00007fd88ec13dc0(0000) GS:ffff9ac9d7640000(0000) knlGS:0000000000000000
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218836249Z]: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218837249Z]: CR2: 000055a195861070 CR3: 00000002588fe000 CR4: 0000000000750ee0
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218837249Z]: PKRU: 55555554
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218838249Z]: Call Trace:
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218840249Z]: <NMI>
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218843249Z]: ? nmi_cpu_backtrace.cold+0x32/0x68
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218847249Z]: ? nmi_cpu_backtrace_handler+0xd/0x20
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218850249Z]: ? nmi_handle+0x53/0xf0
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218853249Z]: ? default_do_nmi+0x40/0x120
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218854249Z]: ? exc_nmi+0xfe/0x130
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218855249Z]: ? end_repeat_nmi+0x16/0x67
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218860249Z]: ? swiotlb_tbl_map_single+0x25c/0x660
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218862249Z]: ? swiotlb_tbl_map_single+0x25c/0x660
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218863249Z]: ? swiotlb_tbl_map_single+0x25c/0x660
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218865249Z]: </NMI>
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218865249Z]: <TASK>
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218866249Z]: iommu_dma_map_page+0x18c/0x230
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218871249Z]: iommu_dma_map_sg+0x219/0x410
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218873249Z]: __dma_map_sg_attrs+0x26/0x90
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218875249Z]: dma_map_sgtable+0x19/0x30
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218876249Z]: nvme_map_data+0xd7/0x860
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218881249Z]: nvme_queue_rqs+0xce/0x280
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218884249Z]: blk_mq_flush_plug_list.part.0+0x1f8/0x2a0
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218887249Z]: __blk_flush_plug+0xf1/0x150
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218889249Z]: blk_finish_plug+0x25/0x40
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218890249Z]: blkdev_write_iter+0x12a/0x1a0
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218893249Z]: aio_write+0x15b/0x270
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218897249Z]: ? io_submit_one+0x467/0x7b0
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218899249Z]: io_submit_one+0x467/0x7b0
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218901249Z]: __x64_sys_io_submit+0xa9/0x170
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218903249Z]: do_syscall_64+0x59/0x90
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218905249Z]: entry_SYSCALL_64_after_hwframe+0x64/0xce
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218908249Z]: RIP: 0033:0x7fd88ed61d3d
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218910249Z]: Code: 0c eb d3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 20 0f 00 f7 d8 64 89 01 48
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218911249Z]: RSP: 002b:00007ffe548f77d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218912249Z]: RAX: ffffffffffffffda RBX: 00007fd88ec126b8 RCX: 00007fd88ed61d3d
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218913249Z]: RDX: 00007ffe548f7828 RSI: 0000000000000001 RDI: 00007fd88c16c000
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218913249Z]: RBP: 00007fd88c16c000 R08: 0000000000000008 R09: 000055a195fdff50
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218914249Z]: R10: 000020001a51e7c0 R11: 0000000000000246 R12: 0000000000000001
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218914249Z]: R13: 0000000000000000 R14: 00007ffe548f7828 R15: 0000000000000000
10.0.0.125: kern: warning: [2024-03-24T16:39:46.218915249Z]: </TASK>
This seems different, seems to have whilst creating the diskpool!? I'm on 6.1.81 and I don't have this issue, I'll check if I can upgrade to 6.1.82 and see what happens.. Although I'm not on TALOS..
Describe the bug When io-engine dies for whatever reason, there is nothing logged in before crash. All app nodes using mounted volume crash (not only one where io-engine was running)
To Reproduce I do not know exact trigger, but was able to reproduce it by deleting pod with io-engine
Expected behavior This could be kernel bug but, I don't know implementation details of mayastor for reporting it there.
Screenshots Reproduced by killing pod:
caught happen organically:
OS info (please complete the following information):
Additional context Not sure if that is any relevant: There are 3 nodes running on proxmox: 1 master and 2 worker nodes each worked node is pinned to one physical cpu only one node is handling storage
Let me know if you need any other info or have any ideas how to debug this further Each worker node is bount