Open pckroon opened 3 months ago
kernel:[ 300.153358] usercopy: Kernel memory exposure attempt detected from page alloc (offset 2076672, size 29224)!
~I'm starting to develop a gut-feeling it is related to the vm I'm trying to run using kubevirt.~
Second guess it's I/O load related. When I try to pv-migrate
a 1TB volume it triggers the kernel panic on the node hosting the pv-migrate pod somewhat reliably.
Hi @pckroon, we never seem this, I wonder if it's related to the loop device. Would you mind using the files directly? Mayastor pool can be created with file directly, without having to setup the loopdev.
Is there a larger trace from dmesg with more information? Also this could simply be a kernel bug, would you be able to try a newer kernel version?
Hello hello! Here's the dmesg traceback, fresh from this morning.
Note that this is still with the loopback device, I'll see if I can swap things around.
I updated the kernel to 6.1.0-23, which is the newest avaible to debian stable.
I tried to delete the existing diskpool for one of the nodes, and switch it for a file mounted directly in the io-engine pod (as HostPath). The existing diskpool gets stuck in Terminating though. To properly clean up after myself I emptied the backing file. The io-engine does recognize this. It cannot import the loop-based device (insufficient space, since I removed the loop device), then fails to import the file backed diskpool, recognizes it's empty, and reinitializes it.
I also noticed that the io-engine spews a lot of ERROR io_engine::bdev::nvmx::handle:handle.rs:387] I/O completed with PI error
messages right before the kernel panic.
Any more help/advice would be much appreciated!
Ok, I managed to remove the old diskpool using the instructions in #1656. It seemed to be a bit more stable, but as soon as is rescaled my postgresql server to 1 replica I got another kernel panic.
Further update: I moved my postgresql data back to the jiva storageclass, and everything seems stable, for now Further-further update: Running VMs using kubevirt backed by mayastor also makes the systems unstable
My conclusion for now is that any application that uses hugepages can/will cause a kernel panic on the node it's running on. I'll still try running VMs backed by jiva storage, but that may have to wait until after my holidays.
A similar issue has been reported with spdk, would you be able to try kernel 6.7? Otherwise would you be able to share steps to reproduce this so we can try with our systems?
I've tested a pv-migrate
of 400GiB volumes without issue on 6.1.87
Thanks for investigating! 6.1.0 is the newest kernel for debian stable, I'm not eager to switch to a higher version. For me a surefire way to trigger this seems to be to start the postgresql server when it's backed by a mayastor pvc. I installed it using the bitnami/postgresql helm chart:
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
postgresql postgresql 11 2024-07-12 17:11:59.889804984 +0200 CEST deployed postgresql-15.5.16 16.3.0
With the following values:
global:
storageClass: mayastor-3
image:
tag: 15-debian-12
debug: true
tls:
enabled: true
autoGenerated: true
primary:
pgHbaConfiguration: |-
local all all trust
host all all localhost trust
host all all 10.0.0.0/8 md5
hostssl all all 192.168.0.0/16 md5
extendedConfiguration: |-
huge_pages = off
resourcesPreset: "medium"
networkPolicy:
enabled: false
service:
type: LoadBalancer
externalTrafficPolicy: Local
persistence:
size: 200Gi
volumePermissions:
enabled: true
My kubevirt VMs seem to run stably when backed by a jiva pvc; so my final diagnosis is that a pod which uses hugepages and a mayastor pvc will cause a kernel panic on the k8s node. Whether this is a mayastor bug, kubernetes bug, cgroups issue, or kernel bug I have no clue...
@pckroon so you're saying we just need to install the postgresql? We don't even need to run any application using the postgresql?
I'm not 100% of course, but for me it crashes as soon as the psql database starts. If that doesn't do it, maybe create a small database with some funny data and open a connection?
Just tried to set this up but it's failing with:
Bus error (core dumped)
Probably related to hugepages
Ok got it running:
global:
storageClass: mayastor-nvmf-3
image:
tag: 15-debian-12
debug: true
tls:
enabled: true
autoGenerated: true
primary:
extendedConfiguration: |-
huge_pages = off
extraVolumeMounts:
- name: pg-sample-config
mountPath: /opt/bitnami/postgresql/share/postgresql.conf.sample
subPath: postgresql.conf.sample
extraVolumes:
- configMap:
name: pg-sample-config
name: pg-sample-config
resourcesPreset: "medium"
networkPolicy:
enabled: false
service:
type: LoadBalancer
externalTrafficPolicy: Local
persistence:
size: 9Gi
extraDeploy:
- apiVersion: v1
kind: ConfigMap
metadata:
name: pg-sample-config
data:
postgresql.conf.sample: |-
huge_pages = off
volumePermissions:
enabled: true
No crash seen though this is a smaller volume... This was on ubuntu 2204 6.2.0
Btw similar issue reported in SPDK: https://github.com/spdk/spdk/issues/2993#issuecomment-1619829992
Thanks again for digging into this. The linked spdk issue seems relevant, but I'm not sure what to do with the information there. Maybe it's just an unlucky combination of Debian (with the hardened usercopy) and the kernel version. Either way, it seems... undesirable that applications that require/want huge pages can't run on mayastor storage.
I'll give postgres a go with the huge_pages configmap, but that'll have to wait until after my holidays I'm afraid. I'll get back to you at the end of August.
Yes at the moment seems like a kernel bug that we may not be able to fix from our side (other than perhaps trying that configmap). In a future version we may allow running mayastor without hugepages, which would be another kind of solution to this.
That's great, thanks @pckroon enjoy your holidays!
I hope you had an excellent summer. I can confirm that I can run my psql database backed by mayastor with the configmap you suggest. That said, I'm a little bit sad that mayastor is not completely application agnostic. Not sure how to proceed with this from here though.
Hello hello! This allows me to run my postgresql server at least, but it seems I have more applications that use hugepages. This issue makes it really hard for me to use mayastor :(
Investigation scoped for v4.3 This needs to be tested in the specified Debian version with hardened_usercopy enabled on the kernel.
Hello world. I'm encountering the following kernel panics after switching from jiva to mayastor, which causes all sorts of chaos on my k8s cluster:
I'm not able to reliably reproduce the issue, which makes debugging harder.
OS info (please complete the following information):
$ helm list -n openebs NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION openebs openebs 2 2024-07-12 14:03:17.931253448 +0200 CEST deployed openebs-4.1.0 4.1.0
ID DISKS MANAGED NODE STATUS CAPACITY ALLOCATED AVAILABLE COMMITTED rohan2013-hostpath-pool aio:///dev/loop0?uuid=0cb2997e-39b5-4bbb-a831-5ee245d75e5c true rohan2013 Online 2.4TiB 228.7GiB 2.2TiB 1.5TiB zix-hostpath-pool aio:///dev/loop0?uuid=36610c71-f54d-4835-a10d-c5b912cf05e2 true zix Online 2.4TiB 12.9GiB 2.4TiB 1.9TiB gondor2013-hostpath-pool aio:///dev/loop0?uuid=4d1474bf-5a6e-479e-853e-0996dc9c9c53 true gondor2013 Online 2.4TiB 44.1GiB 2.4TiB 1.7TiB
jul 16 11:31:53 zix kernel: usercopy: Kernel memory exposure attempt detected from page alloc (offset 0, size 16936)! jul 16 11:31:53 zix kernel: ------------[ cut here ]------------ jul 16 11:31:53 zix kernel: kernel BUG at mm/usercopy.c:101! jul 16 11:31:53 zix kernel: invalid opcode: 0000 [#1] PREEMPT SMP PTI jul 16 11:31:53 zix kernel: CPU: 1 PID: 23165 Comm: io-engine Not tainted 6.1.0-22-amd64 #1 Debian 6.1.94-1 jul 16 11:31:53 zix kernel: Hardware name: Dell Inc. PowerEdge R720xd/0W7JN5, BIOS 2.2.2 01/16/2014 jul 16 11:31:53 zix kernel: RIP: 0010:usercopy_abort+0x75/0x77 jul 16 11:31:53 zix kernel: Code: d5 90 51 48 0f 45 d6 48 89 c1 49 c7 c3 28 41 d7 90 41 52 48 c7 c6 e7 94 d5 90 48 c7 c7 c8 40 d7 90 49 0f 45 f3 e8 da 54 ff ff <0f> 0b 48 89 f1 49 89 e8 44 89 e2 31 f6 48 c7 c7 72 41 d7 90 e8 72 jul 16 11:31:53 zix kernel: RSP: 0018:ffffb72eae827770 EFLAGS: 00010246 jul 16 11:31:53 zix kernel: RAX: 0000000000000059 RBX: ffff9ce746ee8000 RCX: 0000000000000000 jul 16 11:31:53 zix kernel: RDX: 0000000000000000 RSI: ffff9d053f8203a0 RDI: ffff9d053f8203a0 jul 16 11:31:53 zix kernel: RBP: 0000000000004228 R08: 0000000000000000 R09: ffffb72eae827608 jul 16 11:31:53 zix kernel: R10: 0000000000000003 R11: ffff9d057ff0f260 R12: 0000000000000001 jul 16 11:31:53 zix kernel: R13: ffff9ce746eec228 R14: 0000000000008117 R15: ffff9ce9261b1700 jul 16 11:31:53 zix kernel: FS: 00007fc0fdce7dc0(0000) GS:ffff9d053f800000(0000) knlGS:0000000000000000 jul 16 11:31:53 zix kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 jul 16 11:31:53 zix kernel: CR2: 00007ffd893e0e80 CR3: 0000001284c72003 CR4: 00000000001706e0 jul 16 11:31:53 zix kernel: Call Trace: jul 16 11:31:53 zix kernel:
jul 16 11:31:53 zix kernel: ? die_body.cold+0x1a/0x1f
jul 16 11:31:53 zix kernel: ? die+0x2a/0x50
jul 16 11:31:53 zix kernel: ? do_trap+0xc5/0x110
jul 16 11:31:53 zix kernel: ? usercopy_abort+0x75/0x77
jul 16 11:31:53 zix kernel: ? do_error_trap+0x6a/0x90
jul 16 11:31:53 zix kernel: ? usercopy_abort+0x75/0x77
jul 16 11:31:53 zix kernel: ? exc_invalid_op+0x4c/0x60
jul 16 11:31:53 zix kernel: ? usercopy_abort+0x75/0x77
jul 16 11:31:53 zix kernel: ? asm_exc_invalid_op+0x16/0x20
jul 16 11:31:53 zix kernel: ? usercopy_abort+0x75/0x77
jul 16 11:31:53 zix kernel: __check_object_size.cold+0x17/0xcb
jul 16 11:31:53 zix kernel: simple_copy_to_iter+0x25/0x40
jul 16 11:31:53 zix kernel: skb_datagram_iter+0x19e/0x2f0
jul 16 11:31:53 zix kernel: ? skb_free_datagram+0x10/0x10
jul 16 11:31:53 zix kernel: skb_copy_datagram_iter+0x30/0x90
jul 16 11:31:53 zix kernel: tcp_recvmsg_locked+0x5ce/0x940
jul 16 11:31:53 zix kernel: tcp_recvmsg+0x83/0x1f0
jul 16 11:31:53 zix kernel: inet_recvmsg+0x52/0x130
jul 16 11:31:53 zix kernel: sock_read_iter+0x92/0x100
jul 16 11:31:53 zix kernel: do_iter_readv_writev+0x13c/0x150
jul 16 11:31:53 zix kernel: do_iter_read+0xe8/0x1e0
jul 16 11:31:53 zix kernel: vfs_readv+0xa7/0xe0
jul 16 11:31:53 zix kernel: do_readv+0xfa/0x160
jul 16 11:31:53 zix kernel: do_syscall_64+0x55/0xb0
jul 16 11:31:53 zix kernel: ? do_readv+0x117/0x160
jul 16 11:31:53 zix kernel: ? exit_to_user_mode_prepare+0x44/0x1f0
jul 16 11:31:53 zix kernel: ? syscall_exit_to_user_mode+0x1e/0x40
jul 16 11:31:53 zix kernel: ? do_syscall_64+0x61/0xb0
jul 16 11:31:53 zix kernel: ? x64_sys_epoll_wait+0x6f/0x110
jul 16 11:31:53 zix kernel: ? exit_to_user_mode_prepare+0x44/0x1f0
jul 16 11:31:53 zix kernel: ? syscall_exit_to_user_mode+0x1e/0x40
jul 16 11:31:53 zix kernel: ? do_syscall_64+0x61/0xb0
jul 16 11:31:53 zix kernel: ? __fget_light+0x9d/0x100
jul 16 11:31:53 zix kernel: ? fget_light+0x9d/0x100
jul 16 11:31:53 zix kernel: ? do_epoll_wait+0xb2/0x7d0
jul 16 11:31:53 zix kernel: ? __x64_sys_epoll_wait+0x6f/0x110
jul 16 11:31:53 zix kernel: ? exit_to_user_mode_prepare+0x44/0x1f0
jul 16 11:31:53 zix kernel: ? syscall_exit_to_user_mode+0x1e/0x40
jul 16 11:31:53 zix kernel: ? do_syscall_64+0x61/0xb0
jul 16 11:31:53 zix kernel: ? exit_to_user_mode_prepare+0x44/0x1f0
jul 16 11:31:53 zix kernel: ? syscall_exit_to_user_mode+0x1e/0x40
jul 16 11:31:53 zix kernel: ? do_syscall_64+0x61/0xb0
jul 16 11:31:53 zix kernel: ? do_syscall_64+0x61/0xb0
jul 16 11:31:53 zix kernel: ? do_syscall_64+0x61/0xb0
jul 16 11:31:53 zix kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
jul 16 11:31:53 zix kernel: RIP: 0033:0x7fc0fde32367
jul 16 11:31:53 zix kernel: Code: 77 51 c3 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 1b 0d f8 ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 13 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 39 44 89 c7 48 89 44 24 08 e8 74 0d f8 ff 48
jul 16 11:31:53 zix kernel: RSP: 002b:00007ffff3d5e970 EFLAGS: 00000293 ORIG_RAX: 0000000000000013
jul 16 11:31:53 zix kernel: RAX: ffffffffffffffda RBX: 000000000000024b RCX: 00007fc0fde32367
jul 16 11:31:53 zix kernel: RDX: 0000000000000002 RSI: 00007ffff3d5e9a0 RDI: 000000000000024b
jul 16 11:31:53 zix kernel: RBP: 00007ffff3d5e9a0 R08: 0000000000000000 R09: 0000000000000000
jul 16 11:31:53 zix kernel: R10: 0000000000000080 R11: 0000000000000293 R12: 0000000000000002
jul 16 11:31:53 zix kernel: R13: 00007ffff3d5ea00 R14: 00007ffff3d5e9a0 R15: 0000000000008240
jul 16 11:31:53 zix kernel:
jul 16 11:31:53 zix kernel: Modules linked in: tcp_diag udp_diag inet_diag vhost_net vhost vhost_iotlb tap tun blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs iscsi_tcp libiscsi_tcp libis>
jul 16 11:31:53 zix kernel: ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl intel_cstate mgag200 intel_uncore iTCO_wdt intel_pmc_bxt iTCO_vendor_support watchdog meime drm>
jul 16 11:31:53 zix kernel: ---[ end trace 0000000000000000 ]---
jul 16 11:31:53 zix kernel: RIP: 0010:usercopy_abort+0x75/0x77
jul 16 11:31:53 zix kernel: Code: d5 90 51 48 0f 45 d6 48 89 c1 49 c7 c3 28 41 d7 90 41 52 48 c7 c6 e7 94 d5 90 48 c7 c7 c8 40 d7 90 49 0f 45 f3 e8 da 54 ff ff <0f> 0b 48 89 f1 49 89 e8 44 89 e2 31 f6 48 c7 c7 72 41 d7 90 e8 72
jul 16 11:31:53 zix kernel: RSP: 0018:ffffb72eae827770 EFLAGS: 00010246
jul 16 11:31:53 zix kernel: RAX: 0000000000000059 RBX: ffff9ce746ee8000 RCX: 0000000000000000
jul 16 11:31:53 zix kernel: RDX: 0000000000000000 RSI: ffff9d053f8203a0 RDI: ffff9d053f8203a0
jul 16 11:31:53 zix kernel: RBP: 0000000000004228 R08: 0000000000000000 R09: ffffb72eae827608
jul 16 11:31:53 zix kernel: R10: 0000000000000003 R11: ffff9d057ff0f260 R12: 0000000000000001
jul 16 11:31:53 zix kernel: R13: ffff9ce746eec228 R14: 0000000000008117 R15: ffff9ce9261b1700
jul 16 11:31:53 zix kernel: FS: 00007fc0fdce7dc0(0000) GS:ffff9d053f800000(0000) knlGS:0000000000000000
jul 16 11:31:53 zix kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 16 11:31:53 zix kernel: CR2: 00007ffd893e0e80 CR3: 0000001284c72003 CR4: 00000000001706e0
jul 16 11:31:53 zix kernel: Kernel panic - not syncing: Fatal exception
Message from syslogd@zix at Jul 16 11:44:25 ... kernel:[ 480.982446] usercopy: Kernel memory exposure attempt detected from page alloc (offset 20480, size 12816)!