Open aaronjwood opened 4 years ago
FWIW when this happens it is very hard to shutdown/reboot the system cleanly. It takes an extremely long time and hangs at various spots here:
EDIT: full log trace of when reboot was triggered:
Jul 29 16:13:18 vault kernel: sr 0:0:0:1: [sr0] tag#746 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 29 16:13:18 vault kernel: sr 0:0:0:1: [sr0] tag#746 Sense Key : Not Ready [current]
Jul 29 16:13:18 vault kernel: sr 0:0:0:1: [sr0] tag#746 Add. Sense: Medium not present - tray closed
Jul 29 16:13:18 vault kernel: sr 0:0:0:1: [sr0] tag#746 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
Jul 29 16:13:18 vault kernel: blk_update_request: I/O error, dev sr0, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
Jul 29 16:13:18 vault kernel: sr 0:0:0:1: [sr0] tag#747 unaligned transfer
Jul 29 16:13:18 vault kernel: blk_update_request: I/O error, dev sr0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:18 vault kernel: Buffer I/O error on dev sr0, logical block 0, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#744 unaligned transfer
Jul 29 16:13:19 vault kernel: blk_update_request: I/O error, dev sr0, sector 1 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:19 vault kernel: Buffer I/O error on dev sr0, logical block 1, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#745 unaligned transfer
Jul 29 16:13:19 vault kernel: blk_update_request: I/O error, dev sr0, sector 2 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:19 vault kernel: Buffer I/O error on dev sr0, logical block 2, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#385 unaligned transfer
Jul 29 16:13:19 vault kernel: blk_update_request: I/O error, dev sr0, sector 3 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:19 vault kernel: Buffer I/O error on dev sr0, logical block 3, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#279 unaligned transfer
Jul 29 16:13:19 vault kernel: blk_update_request: I/O error, dev sr0, sector 4 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:19 vault kernel: Buffer I/O error on dev sr0, logical block 4, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#266 unaligned transfer
Jul 29 16:13:19 vault kernel: blk_update_request: I/O error, dev sr0, sector 5 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:19 vault kernel: Buffer I/O error on dev sr0, logical block 5, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#267 unaligned transfer
Jul 29 16:13:19 vault kernel: blk_update_request: I/O error, dev sr0, sector 6 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:19 vault kernel: Buffer I/O error on dev sr0, logical block 6, async page read
Jul 29 16:13:19 vault kernel: sr 0:0:0:1: [sr0] tag#268 unaligned transfer
Jul 29 16:13:20 vault kernel: blk_update_request: I/O error, dev sr0, sector 7 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:20 vault kernel: Buffer I/O error on dev sr0, logical block 7, async page read
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#866 unaligned transfer
Jul 29 16:13:20 vault kernel: blk_update_request: I/O error, dev sr0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jul 29 16:13:20 vault kernel: Buffer I/O error on dev sr0, logical block 0, async page read
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#10 unaligned transfer
Jul 29 16:13:20 vault kernel: Buffer I/O error on dev sr0, logical block 1, async page read
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#11 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#12 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#13 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#14 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#15 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#16 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#17 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#18 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#19 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#20 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#21 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#22 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#23 unaligned transfer
Jul 29 16:13:20 vault kernel: sr 0:0:0:1: [sr0] tag#24 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#25 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#26 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#27 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#28 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#29 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#30 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#31 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#32 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#33 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#34 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#35 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#36 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#37 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#38 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#39 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#40 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#41 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#42 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#43 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#44 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#45 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#46 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#47 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#48 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#49 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#50 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#51 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#52 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#53 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#54 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#55 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#56 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#57 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#58 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#59 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#60 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#61 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#62 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#63 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#0 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#1 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#2 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#3 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#4 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#5 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#6 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#7 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#8 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#9 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#10 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#11 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#12 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#13 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#14 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#15 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#16 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#17 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#18 unaligned transfer
Jul 29 16:13:21 vault kernel: sr 0:0:0:1: [sr0] tag#19 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#20 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#21 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#22 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#23 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#24 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#25 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#26 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#27 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#28 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#29 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#30 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#31 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#32 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#33 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#34 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#35 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#36 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#37 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#38 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#39 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#40 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#41 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#42 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#43 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#44 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#45 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#46 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#47 unaligned transfer
Jul 29 16:13:22 vault kernel: sr 0:0:0:1: [sr0] tag#48 unaligned transfer
Jul 29 16:20:54 vault kernel: docker0: port 2(veth37ec875) entered disabled state
Jul 29 16:20:54 vault kernel: veth53d3c1a: renamed from eth0
Jul 29 16:20:54 vault kernel: docker0: port 2(veth37ec875) entered disabled state
Jul 29 16:20:54 vault kernel: device veth37ec875 left promiscuous mode
Jul 29 16:20:54 vault kernel: docker0: port 2(veth37ec875) entered disabled state
Jul 29 16:20:54 vault kernel: vethe05f9b2: renamed from eth0
Jul 29 16:20:54 vault kernel: docker0: port 3(veth95bb255) entered disabled state
Jul 29 16:20:54 vault kernel: Unable to locate Target Portal Group on iqn.2020-04.com.aaronjwood.lan.vault
Jul 29 16:20:54 vault kernel: iSCSI Login negotiation failed.
Jul 29 16:20:54 vault kernel: vetha214a1d: renamed from eth0
Jul 29 16:20:55 vault kernel: br-641b512ac453: port 1(veth21b5a89) entered disabled state
Jul 29 16:20:55 vault kernel: docker0: port 3(veth95bb255) entered disabled state
Jul 29 16:20:55 vault kernel: device veth95bb255 left promiscuous mode
Jul 29 16:20:55 vault kernel: docker0: port 3(veth95bb255) entered disabled state
Jul 29 16:20:55 vault kernel: br-641b512ac453: port 1(veth21b5a89) entered disabled state
Jul 29 16:20:55 vault kernel: device veth21b5a89 left promiscuous mode
Jul 29 16:20:55 vault kernel: br-641b512ac453: port 1(veth21b5a89) entered disabled state
Jul 29 16:20:55 vault kernel: docker0: port 1(vethfa797b5) entered disabled state
Jul 29 16:20:55 vault kernel: veth6aaf00e: renamed from eth0
Jul 29 16:20:55 vault kernel: docker0: port 1(vethfa797b5) entered disabled state
Jul 29 16:20:55 vault kernel: device vethfa797b5 left promiscuous mode
Jul 29 16:20:55 vault kernel: docker0: port 1(vethfa797b5) entered disabled state
Jul 29 16:20:57 vault kernel: docker0: port 4(veth0911cc9) entered disabled state
Jul 29 16:20:57 vault kernel: vethdff1a34: renamed from eth0
Jul 29 16:20:57 vault kernel: docker0: port 4(veth0911cc9) entered disabled state
Jul 29 16:20:57 vault kernel: device veth0911cc9 left promiscuous mode
Jul 29 16:20:57 vault kernel: docker0: port 4(veth0911cc9) entered disabled state
Jul 29 16:20:58 vault kernel: vethc523ecd: renamed from eth0
Jul 29 16:20:58 vault kernel: docker0: port 5(veth258ecf7) entered disabled state
Jul 29 16:20:58 vault kernel: docker0: port 5(veth258ecf7) entered disabled state
Jul 29 16:20:58 vault kernel: device veth258ecf7 left promiscuous mode
Jul 29 16:20:58 vault kernel: docker0: port 5(veth258ecf7) entered disabled state
Jul 29 16:20:58 vault kernel: docker0: port 6(vethd8c1225) entered disabled state
Jul 29 16:20:58 vault kernel: veth44f9777: renamed from eth0
Jul 29 16:20:58 vault kernel: docker0: port 6(vethd8c1225) entered disabled state
Jul 29 16:20:58 vault kernel: device vethd8c1225 left promiscuous mode
Jul 29 16:20:58 vault kernel: docker0: port 6(vethd8c1225) entered disabled state
Jul 29 16:20:58 vault kernel: docker0: port 8(veth4e16f4d) entered disabled state
Jul 29 16:20:58 vault kernel: veth2cd99ec: renamed from eth0
Jul 29 16:20:58 vault kernel: docker0: port 8(veth4e16f4d) entered disabled state
Jul 29 16:20:58 vault kernel: device veth4e16f4d left promiscuous mode
Jul 29 16:20:58 vault kernel: docker0: port 8(veth4e16f4d) entered disabled state
Jul 29 16:21:01 vault kernel: docker0: port 7(veth328cb31) entered disabled state
Jul 29 16:21:01 vault kernel: vethb0388d6: renamed from eth0
Jul 29 16:21:02 vault kernel: docker0: port 7(veth328cb31) entered disabled state
Jul 29 16:21:02 vault kernel: device veth328cb31 left promiscuous mode
Jul 29 16:21:02 vault kernel: docker0: port 7(veth328cb31) entered disabled state
Jul 29 16:21:05 vault systemd-shutdown[1]: Syncing filesystems and block devices.
Jul 29 16:21:45 vault systemd-shutdown[1]: Syncing filesystems and block devices - timed out, issuing SIGKILL to PID 1070391.
Jul 29 16:21:45 vault systemd-shutdown[1]: Sending SIGTERM to remaining processes...
I believe I have the same bug -- also triggered by Plex Media Server (consistently, every few weeks; the server runs a lot more than just Plex but the crash is always triggered by Plex).
Nov 3 15:45:14 callisto kernel: [4229866.738020] BUG: kernel NULL pointer dereference, address: 0000000000000008
Nov 3 15:45:14 callisto kernel: [4229866.746633] #PF: supervisor write access in kernel mode
Nov 3 15:45:14 callisto kernel: [4229866.753246] #PF: error_code(0x0002) - not-present page
Nov 3 15:45:14 callisto kernel: [4229866.759760] PGD 8000000894e8c067 P4D 8000000894e8c067 PUD 894e8d067 PMD 0
Nov 3 15:45:14 callisto kernel: [4229866.768221] Oops: 0002 [#1] SMP PTI
Nov 3 15:45:14 callisto kernel: [4229866.772960] CPU: 2 PID: 1835283 Comm: Plex Media Serv Tainted: P O 5.4.0-47-generic #51-Ubuntu
Nov 3 15:45:14 callisto kernel: [4229866.784867] Hardware name: HP ProLiant DL380e Gen8, BIOS P73 05/24/2019
Nov 3 15:45:14 callisto kernel: [4229866.793026] RIP: 0010:avl_rotation.isra.0+0x211/0x240 [zavl]
Nov 3 15:45:14 callisto kernel: [4229866.800136] Code: 83 c8 01 49 89 43 10 48 85 db 75 27 48 8b 45 d0 4c 89 18 48 83 c4 10 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 63 ff <4c> 89 14 fb e9 e5 fe ff ff 48 63 ff b8 01 00 00 00 4c 89 1c fb 48
Nov 3 15:45:14 callisto kernel: [4229866.823054] RSP: 0018:ffff992388e7fb70 EFLAGS: 00010202
Nov 3 15:45:14 callisto kernel: [4229866.829798] RAX: 000000000000000a RBX: 0000000000000008 RCX: 0000000000000000
Nov 3 15:45:14 callisto kernel: [4229866.838681] RDX: 0000000000000001 RSI: ffff8c9aa23c1a80 RDI: 0000000000000000
Nov 3 15:45:14 callisto kernel: [4229866.847553] RBP: ffff992388e7fba8 R08: 0000000000000000 R09: 0000000000000008
Nov 3 15:45:14 callisto kernel: [4229866.856431] R10: ffff8ca42694e808 R11: 0000000000000000 R12: ffff8ca42694e810
Nov 3 15:45:14 callisto kernel: [4229866.865317] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8c9aa23c1a80
Nov 3 15:45:14 callisto kernel: [4229866.874194] FS: 00007f6f6ffff700(0000) GS:ffff8c9f67480000(0000) knlGS:0000000000000000
Nov 3 15:45:14 callisto kernel: [4229866.884146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 3 15:45:14 callisto kernel: [4229866.891551] CR2: 0000000000000008 CR3: 0000000a4c554002 CR4: 00000000000606e0
Nov 3 15:45:14 callisto kernel: [4229866.900507] Call Trace:
Nov 3 15:45:14 callisto kernel: [4229866.904301] avl_insert+0xa9/0xb0 [zavl]
Nov 3 15:45:14 callisto kernel: [4229866.909794] zfs_rangelock_add_reader+0x14c/0x1e0 [zfs]
Nov 3 15:45:14 callisto kernel: [4229866.916696] zfs_rangelock_enter_reader+0xf1/0x1c0 [zfs]
Nov 3 15:45:14 callisto kernel: [4229866.923705] zfs_rangelock_enter+0xed/0xf0 [zfs]
Nov 3 15:45:14 callisto kernel: [4229866.929945] zfs_get_data+0x158/0x350 [zfs]
Nov 3 15:45:14 callisto kernel: [4229866.935699] zil_lwb_commit+0x1c6/0x360 [zfs]
Nov 3 15:45:14 callisto kernel: [4229866.941639] zil_process_commit_list+0xf1/0x210 [zfs]
Nov 3 15:45:14 callisto kernel: [4229866.948667] zil_commit_writer.isra.0+0xa3/0xb0 [zfs]
Nov 3 15:45:14 callisto kernel: [4229866.955352] zil_commit_impl+0x59/0xa0 [zfs]
Nov 3 15:45:14 callisto kernel: [4229866.961147] zil_commit+0x40/0x60 [zfs]
Nov 3 15:45:14 callisto kernel: [4229866.966436] zfs_fsync+0x7a/0xe0 [zfs]
Nov 3 15:45:14 callisto kernel: [4229866.971591] zpl_fsync+0x5c/0x90 [zfs]
Nov 3 15:45:14 callisto kernel: [4229866.976677] ? do_fcntl+0x323/0x560
Nov 3 15:45:14 callisto kernel: [4229866.981469] vfs_fsync_range+0x49/0x80
Nov 3 15:45:14 callisto kernel: [4229866.986565] ? __fget_light+0x57/0x70
Nov 3 15:45:14 callisto kernel: [4229866.991554] do_fsync+0x3d/0x70
Nov 3 15:45:14 callisto kernel: [4229866.995974] __x64_sys_fsync+0x14/0x20
Nov 3 15:45:14 callisto kernel: [4229867.001061] do_syscall_64+0x57/0x190
Nov 3 15:45:14 callisto kernel: [4229867.006052] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 3 15:45:14 callisto kernel: [4229867.012582] RIP: 0033:0x7f6fca57a8db
Nov 3 15:45:14 callisto kernel: [4229867.017481] Code: 4a 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 13 f7 ff ff 8b 7c 24 0c 41 89 c0 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2f 44 89 c7 89 44 24 0c e8 51 f7 ff ff 8b 44
Nov 3 15:45:14 callisto kernel: [4229867.039992] RSP: 002b:00007f6f6fffd960 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Nov 3 15:45:14 callisto kernel: [4229867.049362] RAX: ffffffffffffffda RBX: 0000000002f0bdc8 RCX: 00007f6fca57a8db
Nov 3 15:45:14 callisto kernel: [4229867.058269] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 000000000000000c
Nov 3 15:45:14 callisto kernel: [4229867.067186] RBP: 0000000002efb328 R08: 0000000000000000 R09: 00007f6ed806f760
Nov 3 15:45:14 callisto kernel: [4229867.076244] R10: 00007f6ed80008d0 R11: 0000000000000293 R12: 0000000000000000
Nov 3 15:45:14 callisto kernel: [4229867.085155] R13: 0000000002f0bd38 R14: 0000000000000002 R15: 0000000000000000
Nov 3 15:45:14 callisto kernel: [4229867.094062] Modules linked in: cpuid sctp tcp_diag inet_diag ipt_REJECT nf_reject_ipv4 xt_multiport rpcsec_gss_krb5 nfsv4 nfs fscache aufs overlay ip6table_filter ip6table_nat ip6_tables xt_tcpudp xt_state xt_conntrack iptable_filter xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter bonding intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper rapl intel_cstate serio_raw cdc_acm hpilo ioatdma sch_fq_codel ipmi_ssif mac_hid ipmi_si ipmi_devintf ipmi_msghandler coretemp parport_pc ppdev lp parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlua(PO) uas usb_storage mgag200 drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt crc32_pclmul mpt3sas fb_sys_fops igb psmouse ahci nvme raid_class libahci dca drm lpc_ich
Nov 3 15:45:14 callisto kernel: [4229867.094110] scsi_transport_sas i2c_algo_bit nvme_core netxen_nic
Nov 3 15:45:14 callisto kernel: [4229867.205779] CR2: 0000000000000008
Nov 3 15:45:14 callisto kernel: [4229867.210729] ---[ end trace db0707df8cd3a864 ]---
Nov 3 15:45:14 callisto kernel: [4229867.224840] ERST: [Firmware Warn]: Firmware does not respond in time.
Nov 3 15:45:14 callisto kernel: [4229867.242613] RIP: 0010:avl_rotation.isra.0+0x211/0x240 [zavl]
Nov 3 15:45:14 callisto kernel: [4229867.250019] Code: 83 c8 01 49 89 43 10 48 85 db 75 27 48 8b 45 d0 4c 89 18 48 83 c4 10 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 63 ff <4c> 89 14 fb e9 e5 fe ff ff 48 63 ff b8 01 00 00 00 4c 89 1c fb 48
Nov 3 15:45:14 callisto kernel: [4229867.272919] RSP: 0018:ffff992388e7fb70 EFLAGS: 00010202
Nov 3 15:45:14 callisto kernel: [4229867.279884] RAX: 000000000000000a RBX: 0000000000000008 RCX: 0000000000000000
Nov 3 15:45:14 callisto kernel: [4229867.288954] RDX: 0000000000000001 RSI: ffff8c9aa23c1a80 RDI: 0000000000000000
Nov 3 15:45:14 callisto kernel: [4229867.298021] RBP: ffff992388e7fba8 R08: 0000000000000000 R09: 0000000000000008
Nov 3 15:45:14 callisto kernel: [4229867.307086] R10: ffff8ca42694e808 R11: 0000000000000000 R12: ffff8ca42694e810
Nov 3 15:45:14 callisto kernel: [4229867.316156] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8c9aa23c1a80
Nov 3 15:45:14 callisto kernel: [4229867.325353] FS: 00007f6f6ffff700(0000) GS:ffff8c9f67480000(0000) knlGS:0000000000000000
Nov 3 15:45:14 callisto kernel: [4229867.335547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 3 15:45:14 callisto kernel: [4229867.343103] CR2: 0000000000000008 CR3: 0000000a4c554002 CR4: 00000000000606e0
Apart from running the same Ubuntu version and kernel (I've experienced the crash on several different 5.4.0-generic Ubuntu builds), my environment is not particularly similar to the original reporter's: not a VM, just Ubuntu 20.04 on bare metal; Xeon E5-2407 CPU; LSI SAS2308 HBA.
@mas90 is that also zfs 0.8.3 ?
@mas90 is that also zfs 0.8.3 ?
Yes - 0.8.3-1ubuntu12.4.
I have the same crash running Linux 5.4.0-58-generic #64-Ubuntu SMP, zfs 0.8.3-1ubuntu12.5 and with a Plex workload. Although for me it causes the server to stop responding to network traffic.
We had the same issue:
[155116.495329] BUG: kernel NULL pointer dereference, address: 0000000000000008
[155116.495375] #PF: supervisor write access in kernel mode
[155116.495399] #PF: error_code(0x0002) - not-present page
[155116.495423] PGD 0 P4D 0
[155116.495440] Oops: 0002 [#1] PREEMPT SMP NOPTI
[155116.495464] CPU: 18 PID: 83956 Comm: nfsd Kdump: loaded Tainted: P W OE 5.4.0-26-lowlatency #30mw
[155116.495508] Hardware name: Supermicro SYS-1028U-E1CRTP+/X10DRU-i+, BIOS 2.0b 08/09/2016
[155116.495548] RIP: 0010:avl_rotation.isra.0+0x211/0x240 [zavl]
[155116.495580] Code: 83 c8 01 49 89 43 10 48 85 db 75 27 48 8b 45 d0 4c 89 18 48 83 c4 10 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 63 ff <4c> 89 14 fb e9 e5 fe ff ff 48 63
ff b8 01 00 00 00 4c 89 1c fb 48
[155116.495660] RSP: 0018:ffffa98d851bf9c0 EFLAGS: 00010202
[155116.495689] RAX: 000000000000000a RBX: 0000000000000008 RCX: 0000000000000000
[155116.495722] RDX: 0000000000000001 RSI: ffff9339c9443d00 RDI: 0000000000000000
[155116.495755] RBP: ffffa98d851bf9f8 R08: 0000000000000000 R09: 0000000000000008
[155116.495791] R10: ffff9348a778bd08 R11: 0000000000000000 R12: ffff9348a778bd10
[155116.495823] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff9339c9443d00
[155116.495857] FS: 0000000000000000(0000) GS:ffff934cffa80000(0000) knlGS:0000000000000000
[155116.495899] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[155116.495926] CR2: 0000000000000008 CR3: 0000002c40694004 CR4: 00000000001606e0
[155116.495959] Call Trace:
[155116.495980] avl_insert+0xa9/0xb0 [zavl]
[155116.496104] zfs_rangelock_add_reader+0x14c/0x1e0 [zfs]
[155116.496189] zfs_rangelock_enter_reader+0xf1/0x1c0 [zfs]
[155116.496272] zfs_rangelock_enter+0xed/0xf0 [zfs]
[155116.496352] zfs_get_data+0x158/0x350 [zfs]
[155116.496429] zil_lwb_commit+0x1c6/0x360 [zfs]
[155116.496504] zil_process_commit_list+0x107/0x230 [zfs]
[155116.496580] zil_commit_writer.isra.0+0xa3/0xb0 [zfs]
[155116.496658] zil_commit_impl+0x59/0xa0 [zfs]
[155116.496730] zil_commit+0x40/0x60 [zfs]
[155116.496803] zfs_fsync+0x7a/0xe0 [zfs]
[155116.496873] zpl_commit_metadata+0x4c/0x70 [zfs]
[155116.496905] commit_inode_metadata+0x1f/0x30 [nfsd]
[155116.496935] nfsd_create_setattr+0x5c/0x80 [nfsd]
[155116.496963] nfsd_create_locked+0x10c/0x2c0 [nfsd]
[155116.496992] nfsd_create+0x10b/0x170 [nfsd]
[155116.497019] nfsd3_proc_mkdir+0xda/0x170 [nfsd]
[155116.497047] nfsd_dispatch+0xd6/0x220 [nfsd]
[155116.497083] svc_process_common+0x3af/0x710 [sunrpc]
[155116.497118] ? svc_sock_secure_port+0x16/0x30 [sunrpc]
[155116.497147] ? nfsd_svc+0x2f0/0x2f0 [nfsd]
[155116.497177] svc_process+0xd9/0x110 [sunrpc]
[155116.497203] nfsd+0xf2/0x150 [nfsd]
[155116.498056] kthread+0x104/0x140
[155116.498951] ? nfsd_destroy+0x70/0x70 [nfsd]
[155116.499790] ? kthread_park+0x90/0x90
[155116.500597] ret_from_fork+0x1f/0x40
I am seeing the same thing here. Sporadic misbehaviour on one of my LXD containers that are backed by a very trivial ZFS pool that runs Plex. Every couple of weeks, load increases though no specific process can be blamed for the load. Unable to manipulate the container in anyway. Reboots take forever due to issues shutting down the ZFS pool I have.
From dmesg
:
[Thu Aug 26 04:21:50 2021] BUG: kernel NULL pointer dereference, address: 0000000000000008
[Thu Aug 26 04:21:50 2021] #PF: supervisor write access in kernel mode
[Thu Aug 26 04:21:50 2021] #PF: error_code(0x0002) - not-present page
[Thu Aug 26 04:21:50 2021] PGD 0 P4D 0
[Thu Aug 26 04:21:50 2021] Oops: 0002 [#1] SMP PTI
[Thu Aug 26 04:21:50 2021] CPU: 7 PID: 3369243 Comm: Plex Media Serv Tainted: P O 5.4.0-81-generic #91-Ubuntu
[Thu Aug 26 04:21:50 2021] Hardware name: Dell Inc. PowerEdge T320/0FDT3J, BIOS 2.4.2 01/29/2015
[Thu Aug 26 04:21:50 2021] RIP: 0010:avl_rotation.isra.0+0x211/0x240 [zavl]
[Thu Aug 26 04:21:50 2021] Code: 83 c8 01 49 89 43 10 48 85 db 75 27 48 8b 45 d0 4c 89 18 48 83 c4 10 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 63 ff <4c> 89 14 fb e9 e5 fe ff ff 48 63 ff b8 01 00 00 00 4c 89 1c fb 48
[Thu Aug 26 04:21:50 2021] RSP: 0018:ffff96f845bbbb70 EFLAGS: 00010202
[Thu Aug 26 04:21:50 2021] RAX: 000000000000000a RBX: 0000000000000008 RCX: 0000000000000000
[Thu Aug 26 04:21:50 2021] RDX: 0000000000000001 RSI: ffff89f7d753b460 RDI: 0000000000000000
[Thu Aug 26 04:21:50 2021] RBP: ffff96f845bbbba8 R08: 0000000000000000 R09: 0000000000000008
[Thu Aug 26 04:21:50 2021] R10: ffff89f432dfad08 R11: 0000000000000000 R12: ffff89f432dfad10
[Thu Aug 26 04:21:50 2021] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff89f7d753b460
[Thu Aug 26 04:21:50 2021] FS: 00007f80f47d7b38(0000) GS:ffff89fb0f1c0000(0000) knlGS:0000000000000000
[Thu Aug 26 04:21:50 2021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Aug 26 04:21:50 2021] CR2: 0000000000000008 CR3: 000000066fd64004 CR4: 00000000001606e0
[Thu Aug 26 04:21:50 2021] Call Trace:
[Thu Aug 26 04:21:50 2021] avl_insert+0xa9/0xb0 [zavl]
[Thu Aug 26 04:21:50 2021] zfs_rangelock_add_reader+0x14c/0x1e0 [zfs]
[Thu Aug 26 04:21:50 2021] zfs_rangelock_enter_reader+0xf1/0x1c0 [zfs]
[Thu Aug 26 04:21:50 2021] zfs_rangelock_enter+0xed/0xf0 [zfs]
[Thu Aug 26 04:21:50 2021] zfs_get_data+0x158/0x350 [zfs]
[Thu Aug 26 04:21:50 2021] zil_lwb_commit+0x1c6/0x360 [zfs]
[Thu Aug 26 04:21:50 2021] zil_process_commit_list+0xf1/0x210 [zfs]
[Thu Aug 26 04:21:50 2021] zil_commit_writer.isra.0+0xa3/0xb0 [zfs]
[Thu Aug 26 04:21:50 2021] zil_commit_impl+0x59/0xa0 [zfs]
[Thu Aug 26 04:21:50 2021] zil_commit+0x40/0x60 [zfs]
[Thu Aug 26 04:21:50 2021] zfs_fsync+0x7a/0xe0 [zfs]
[Thu Aug 26 04:21:50 2021] zpl_fsync+0x5c/0x90 [zfs]
[Thu Aug 26 04:21:50 2021] vfs_fsync_range+0x49/0x80
[Thu Aug 26 04:21:50 2021] ? __fget_light+0x57/0x70
[Thu Aug 26 04:21:50 2021] do_fsync+0x3d/0x70
[Thu Aug 26 04:21:50 2021] __x64_sys_fsync+0x14/0x20
[Thu Aug 26 04:21:50 2021] do_syscall_64+0x57/0x190
[Thu Aug 26 04:21:50 2021] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Thu Aug 26 04:21:50 2021] RIP: 0033:0x7f810138b739
[Thu Aug 26 04:21:50 2021] Code: c0 0f 85 24 00 00 00 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 d5 cb ff ff 41 57 41 56 53 48 81 ec 90 00 00 00 49 89 f6 b8
[Thu Aug 26 04:21:50 2021] RSP: 002b:00007f80f47d4508 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[Thu Aug 26 04:21:50 2021] RAX: ffffffffffffffda RBX: 000000000000004a RCX: 00007f810138b739
[Thu Aug 26 04:21:50 2021] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
[Thu Aug 26 04:21:50 2021] RBP: 00007f80f47d4580 R08: 0000000000000000 R09: 0000000000000000
[Thu Aug 26 04:21:50 2021] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f80f47d7b38
[Thu Aug 26 04:21:50 2021] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f80f47d7b74
[Thu Aug 26 04:21:50 2021] Modules linked in: unix_diag hid_plantronics uas usb_storage ufs qnx4 hfsplus hfs minix ntfs msdos jfs nf_conntrack_netlink xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE xfrm_user xfrm_algo xt_addrtype overlay nfs lockd grace fscache sunrpc binfmt_misc veth ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter nf_tables nfnetlink vhost_vsock vmw_vsock_virtio_transport_common vhost vsock bridge stp llc xfs nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm cp210x usbserial mei_me rapl dcdbas ipmi_si intel_cstate joydev input_leds ipmi_devintf mei ipmi_msghandler mac_hid acpi_power_meter sch_fq_codel ip_tables x_tables autofs4 btrfs
[Thu Aug 26 04:21:50 2021] zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c raid1 raid0 multipath linear mgag200 drm_vram_helper i2c_algo_bit ttm crct10dif_pclmul crc32_pclmul drm_kms_helper syscopyarea ghash_clmulni_intel sysfillrect aesni_intel sysimgblt fb_sys_fops crypto_simd cryptd ahci glue_helper drm lpc_ich libahci tg3 megaraid_sas wmi
[Thu Aug 26 04:21:50 2021] CR2: 0000000000000008
[Thu Aug 26 04:21:50 2021] ---[ end trace d0c848bce097565a ]---
[Thu Aug 26 04:21:50 2021] RIP: 0010:avl_rotation.isra.0+0x211/0x240 [zavl]
[Thu Aug 26 04:21:50 2021] Code: 83 c8 01 49 89 43 10 48 85 db 75 27 48 8b 45 d0 4c 89 18 48 83 c4 10 b8 01 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 63 ff <4c> 89 14 fb e9 e5 fe ff ff 48 63 ff b8 01 00 00 00 4c 89 1c fb 48
[Thu Aug 26 04:21:50 2021] RSP: 0018:ffff96f845bbbb70 EFLAGS: 00010202
[Thu Aug 26 04:21:50 2021] RAX: 000000000000000a RBX: 0000000000000008 RCX: 0000000000000000
[Thu Aug 26 04:21:50 2021] RDX: 0000000000000001 RSI: ffff89f7d753b460 RDI: 0000000000000000
[Thu Aug 26 04:21:50 2021] RBP: ffff96f845bbbba8 R08: 0000000000000000 R09: 0000000000000008
[Thu Aug 26 04:21:50 2021] R10: ffff89f432dfad08 R11: 0000000000000000 R12: ffff89f432dfad10
[Thu Aug 26 04:21:50 2021] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff89f7d753b460
[Thu Aug 26 04:21:50 2021] FS: 00007f80f47d7b38(0000) GS:ffff89fb0f1c0000(0000) knlGS:0000000000000000
[Thu Aug 26 04:21:50 2021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Aug 26 04:21:50 2021] CR2: 0000000000000008 CR3: 000000066fd64004 CR4: 00000000001606e0
ZFS pool info:
djwhyte@server1:~$ zpool status
pool: default
state: ONLINE
scan: scrub repaired 0B in 0 days 00:16:44 with 0 errors on Sun Aug 8 00:40:46 2021
config:
NAME STATE READ WRITE CKSUM
default ONLINE 0 0 0
sdc ONLINE 0 0 0
errors: No known data errors
djwhyte@server1:~$
Version info:
djwhyte@server1:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal
djwhyte@server1:~$ uname -a
Linux server1 5.4.0-81-generic #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
djwhyte@server1:~$ apt list zfsutils-linux
Listing... Done
zfsutils-linux/focal-updates 0.8.3-1ubuntu12.12 amd64 [upgradable from: 0.8.3-1ubuntu12.11]
N: There are 3 additional versions. Please use the '-a' switch to see them.
Related thread in the LXD forums - https://discuss.linuxcontainers.org/t/single-lxd-container-gets-in-a-bad-state-need-a-host-restart-to-resolve/11751
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
I also use ZFS and a Plex server, I was looking at my email alerts from a few days ago:
It automatically restarted, I didn't notice the downtime but I got the alert
# journalctl --since "2024-07-03 02:20:24"
jul 03 02:20:24 bserver kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
jul 03 02:20:24 bserver kernel: #PF: supervisor write access in kernel mode
jul 03 02:20:24 bserver kernel: #PF: error_code(0x0002) - not-present page
jul 03 02:20:24 bserver kernel: PGD 0 P4D 0
jul 03 02:20:24 bserver kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
jul 03 02:20:24 bserver kernel: CPU: 0 PID: 2559 Comm: CPU 0/KVM Tainted: P O 6.5.0-41-generic #41~22.04.2-Ubuntu
jul 03 02:20:24 bserver kernel: Hardware name: BIOSTAR Group B450MH/B450MH, BIOS 5.17 03/04/2024
jul 03 02:20:24 bserver kernel: RIP: 0010:_raw_spin_lock+0x13/0x60
jul 03 02:20:24 bserver kernel: Code: 31 db e9 00 18 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 65 ff 05 8c 12 ef 51 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 1b 31 c0 31 d2 31 c9 31>
jul 03 02:20:24 bserver kernel: RSP: 0018:ffffadf986603c90 EFLAGS: 00010046
jul 03 02:20:24 bserver kernel: RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
jul 03 02:20:24 bserver kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000008
jul 03 02:20:24 bserver kernel: RBP: ffffadf986603cc8 R08: 0000000000000000 R09: 0000000000000000
jul 03 02:20:24 bserver kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
jul 03 02:20:24 bserver kernel: R13: 0000000000000092 R14: ffff9730f101fb50 R15: 0226800000000000
jul 03 02:20:24 bserver kernel: FS: 00007e293e071640(0000) GS:ffff9730f1000000(0000) knlGS:0000000000000000
jul 03 02:20:24 bserver kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 03 02:20:24 bserver kernel: CR2: 0000000000000008 CR3: 000000013e782000 CR4: 00000000003506f0
jul 03 02:20:24 bserver kernel: Call Trace:
jul 03 02:20:24 bserver kernel: <TASK>
jul 03 02:20:24 bserver kernel: ? show_regs+0x6d/0x80
jul 03 02:20:24 bserver kernel: ? __die+0x24/0x80
jul 03 02:20:24 bserver kernel: ? page_fault_oops+0x99/0x1b0
jul 03 02:20:24 bserver kernel: ? do_user_addr_fault+0x31d/0x6b0
jul 03 02:20:24 bserver kernel: ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel: ? kvm_set_lapic_tscdeadline_msr+0x50/0xa0 [kvm]
jul 03 02:20:24 bserver kernel: ? exc_page_fault+0x83/0x1b0
jul 03 02:20:24 bserver kernel: ? asm_exc_page_fault+0x27/0x30
jul 03 02:20:24 bserver kernel: ? _raw_spin_lock+0x13/0x60
jul 03 02:20:24 bserver kernel: ? speculation_ctrl_update+0xda/0x1e0
jul 03 02:20:24 bserver kernel: x86_virt_spec_ctrl+0x61/0x70
jul 03 02:20:24 bserver kernel: svm_vcpu_run+0x59a/0x860 [kvm_amd]
jul 03 02:20:24 bserver kernel: vcpu_enter_guest+0x456/0xf00 [kvm]
jul 03 02:20:24 bserver kernel: ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel: ? kvm_apic_local_deliver+0xa1/0xd0 [kvm]
jul 03 02:20:24 bserver kernel: vcpu_run+0x46/0x290 [kvm]
jul 03 02:20:24 bserver kernel: kvm_arch_vcpu_ioctl_run+0x1d4/0x590 [kvm]
jul 03 02:20:24 bserver kernel: ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel: ? rseq_ip_fixup+0x90/0x1f0
jul 03 02:20:24 bserver kernel: kvm_vcpu_ioctl+0x297/0x800 [kvm]
jul 03 02:20:24 bserver kernel: ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel: ? __seccomp_filter+0x37b/0x560
jul 03 02:20:24 bserver kernel: ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel: ? __fget_light+0xa5/0x120
jul 03 02:20:24 bserver kernel: __x64_sys_ioctl+0xa3/0xf0
jul 03 02:20:24 bserver kernel: x64_sys_call+0x1198/0x20b0
jul 03 02:20:24 bserver kernel: do_syscall_64+0x55/0x90
jul 03 02:20:24 bserver kernel: ? srso_return_thunk+0x5/0x10
jul 03 02:20:24 bserver kernel: ? do_syscall_64+0x61/0x90
jul 03 02:20:24 bserver kernel: ? common_interrupt+0x54/0xb0
jul 03 02:20:24 bserver kernel: entry_SYSCALL_64_after_hwframe+0x73/0xdd
jul 03 02:20:24 bserver kernel: RIP: 0033:0x7e2945af194f
jul 03 02:20:24 bserver kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44>
jul 03 02:20:24 bserver kernel: RSP: 002b:00007e293e070460 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
jul 03 02:20:24 bserver kernel: RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007e2945af194f
jul 03 02:20:24 bserver kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000015
jul 03 02:20:24 bserver kernel: RBP: 00005dd2075f2b30 R08: 00005dd204e43f10 R09: 0000000000000000
jul 03 02:20:24 bserver kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
jul 03 02:20:24 bserver kernel: R13: 0000000000000001 R14: 000000000000c040 R15: 0000000000000000
jul 03 02:20:24 bserver kernel: </TASK>
jul 03 02:20:24 bserver kernel: Modules linked in: macvtap macvlan vhost_net vhost vhost_iotlb tap rpcsec_gss_krb5 nfsv4 nfs fscache netfs xt_CHECKSUM xt_conntrack ipt_REJECT nf_reject_ipv4 bridge stp llc>
jul 03 02:20:24 bserver kernel: libcrc32c raid1 raid0 multipath linear hid_generic crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic usbhid ghash_clmulni_intel sha256_ssse3 hid sha1_ssse3 aes>
jul 03 02:20:24 bserver kernel: CR2: 0000000000000008
jul 03 02:20:24 bserver kernel: ---[ end trace 0000000000000000 ]---
jul 03 02:20:24 bserver kernel: pstore: backend (efi_pstore) writing error (-5)
jul 03 02:20:24 bserver kernel: RIP: 0010:_raw_spin_lock+0x13/0x60
jul 03 02:20:24 bserver kernel: Code: 31 db e9 00 18 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 65 ff 05 8c 12 ef 51 31 c0 ba 01 00 00 00 <3e> 0f b1 17 75 1b 31 c0 31 d2 31 c9 31>
jul 03 02:20:24 bserver kernel: RSP: 0018:ffffadf986603c90 EFLAGS: 00010046
jul 03 02:20:24 bserver kernel: RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
jul 03 02:20:24 bserver kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000008
jul 03 02:20:24 bserver kernel: RBP: ffffadf986603cc8 R08: 0000000000000000 R09: 0000000000000000
jul 03 02:20:24 bserver kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
jul 03 02:20:24 bserver kernel: R13: 0000000000000092 R14: ffff9730f101fb50 R15: 0226800000000000
jul 03 02:20:24 bserver kernel: FS: 00007e293e071640(0000) GS:ffff9730f1000000(0000) knlGS:0000000000000000
jul 03 02:20:24 bserver kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 03 02:20:24 bserver kernel: CR2: 0000000000000008 CR3: 000000013e782000 CR4: 00000000003506f0
jul 03 02:20:24 bserver kernel: note: CPU 0/KVM[2559] exited with irqs disabled
jul 03 02:20:24 bserver kernel: note: CPU 0/KVM[2559] exited with preempt_count 2
Pool is working as expected:
# zpool status apool
pool: apool
state: ONLINE
scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Jul 6 23:36:38 2024
config:
NAME STATE READ WRITE CKSUM
apool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
errors: No known data errors
Version info:
~ ❯ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
~ ❯ uname -a
Linux bserver 6.5.0-41-generic #41~22.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 3 11:32:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
~ ❯ apt list zfsutils-linux
Listing... Done
zfsutils-linux/jammy-updates,jammy-security,now 2.1.5-1ubuntu6~22.04.4 amd64 [installed]
System information
Describe the problem you're observing
After a few days (sometimes longer) there is strange behavior that is noticeable in the apps that are served up to the network which are backed by ZFS datasets. Looking at
dmesg
shows the crash. This never causes the OS to completely crash and after this happens ZFS seems to work, but certain workloads against my datasets seem to hang.Describe how to reproduce the problem
Not quite sure how the scenario is triggered. I see that after running my server for a few days to maybe 1.5 weeks this can happen.
Include any warning/errors/backtraces from the system logs
As you can see I'm inside a VM here. Some details about my setup:
25f9:00:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
Guest CPU details:
Guest memory ranges: