openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.65k stars 1.75k forks source link

Kernel oops-> BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 #13479

Open constantmanish opened 2 years ago

constantmanish commented 2 years ago

We are seeing the following Call trace regularly which is causing kernel hung.

May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.208813] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.208897] IP: buf_hash_remove+0x6b/0xc0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.208916] PGD 0 P4D 0
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.208928] Oops: 0000  SMP PTI
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.208942] Modules linked in: tcp_diag inet_diag ip6table_security ip6table_raw ip6table_mangle ip6table_nat nf_na
t_ipv6 nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iptable_security xt_CT iptable_raw iptable_mangle iptable_nat nf_nat_ipv4 nf_lo
g_ipv4 nf_log_common xt_LOG xt_tcpudp xt_set xt_hashlimit ip_set_hash_net ip_set iptable_filter ip_tables xt_conntrack x_tables nf_nat bridge stp llc nf_connt
rack_netlink nfnetlink xfrm_user xfrm_algo aufs binfmt_misc ipmi_devintf ipmi_msghandler vmw_vsock_vmci_transport vsock ppdev sb_edac vmw_balloon crct10dif_pc
lmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf joydev input_leds serio_raw i2c_piix4 shpchp vmw_vmci parport_pc parport mac_hid ib_iser rdma_cm iw_cm i
b_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209195]  nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack sunrpc autofs4 zfs(PO) zunicode(PO) zav
l(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc
32c raid1 raid0 multipath linear vmwgfx ttm mptspi mptscsih drm_kms_helper aesni_intel syscopyarea sysfillrect aes_x86_64 sysimgblt crypto_simd fb_sys_fops cryptd glue_helper psmouse mptbase vmxnet3 drm scsi_transport_spi pata_acpi floppy scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209364] CPU: 2 PID: 2903 Comm: txg_sync Tainted: P           O     4.15.0-54-generic [#58](https://logpointsupport.zendesk.com/agent/tickets/58)~16.04.1-Ubuntu
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209393] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209468] RIP: 0010:buf_hash_remove+0x6b/0xc0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209485] RSP: 0018:ffffa18c20fb7748 EFLAGS: 00010282
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209501] RAX: 00000000007bc2b5 RBX: 0000000000000001 RCX: 0000000000000000
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209522] RDX: 0000000000000000 RSI: ffff9471e57ef720 RDI: ffff9471e57ef710
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209543] RBP: ffffa18c20fb7748 R08: 0000000001b543fe R09: 377ca0ac887efd50
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209564] R10: ffffa18c20fb7630 R11: 0000000000000010 R12: ffffffffc07ca040
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209584] R13: ffff9471e57ef710 R14: 0000000000000001 R15: ffffffffc07c9f80
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209606] FS:  0000000000000000(0000) GS:ffff94727fc80000(0000) knlGS:0000000000000000
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.209647] CR2: 0000000000000020 CR3: 00000000b340a003 CR4: 00000000001606e0
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.210338] Call Trace:
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.211002]  arc_change_state.isra.22+0x2fb/0x3a0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.211677]  arc_release+0x5e1/0x720 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.212299]  ? taskq_dispatch_ent+0x55/0x160 [spl]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.212937]  ? zio_reexecute+0x3a0/0x3a0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.213560]  dbuf_write.isra.19+0xa4/0x460 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.214196]  ? zio_taskq_dispatch+0x8d/0x90 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.214832]  ? zio_issue_async+0x12/0x20 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.215460]  ? zio_nowait+0xbc/0x150 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.216058]  dbuf_sync_indirect+0xc2/0x1b0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.216645]  dbuf_sync_list+0xcb/0xf0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.217216]  dbuf_sync_indirect+0xfd/0x1b0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.217775]  dbuf_sync_list+0xcb/0xf0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.218327]  dbuf_sync_indirect+0xfd/0x1b0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.218856]  dbuf_sync_list+0xcb/0xf0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.219367]  dbuf_sync_indirect+0xfd/0x1b0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.219865]  dbuf_sync_list+0xcb/0xf0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.220347]  dbuf_sync_indirect+0xfd/0x1b0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.220817]  dbuf_sync_list+0xcb/0xf0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.221280]  dbuf_sync_indirect+0xfd/0x1b0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.221719]  ? arc_write_children_ready+0x30/0x30 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.222150]  dbuf_sync_list+0xcb/0xf0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.222570]  dnode_sync+0x404/0x830 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.222975]  ? dmu_objset_sync+0x17a/0x460 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.223388]  dmu_objset_sync+0x1a2/0x460 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.223774]  dsl_dataset_sync+0x6d/0x2e0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.224145]  dsl_pool_sync+0x9f/0x420 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.224514]  spa_sync+0x41d/0xda0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.224883]  txg_sync_thread+0x2d4/0x4a0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.225254]  ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.225596]  thread_generic_wrapper+0x74/0x90 [spl]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.225932]  kthread+0x105/0x140
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.226264]  ? __thread_exit+0x20/0x20 [spl]
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.226599]  ? kthread_destroy_worker+0x50/0x50
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.226925]  ret_from_fork+0x35/0x40
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.227241] Code: 89 ca 48 c1 ea 08 4c 31 c2 48 31 d0 48 23 05 5d fe 17 00 48 8b 15 5e fe 17 00 48 8d 14 c2 48 8b 0a 48 39 cf 75 05 eb 10 48 89 d1 <48> 8b 51 20 48 39 d7 75 f4 48 8d 51 20 48 8b 4f 20 48 89 0a 48
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.228289] RIP: buf_hash_remove+0x6b/0xc0 [zfs] RSP: ffffa18c20fb7748
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.228640] CR2: 0000000000000020
May  5 00:24:42 CPSACL-AT-LPS01 kernel: [19976.228988] ---[ end trace 93b979e134c6dd03 ]---

system info: Linux kmlog11 4.15.0-54-generic #58~16.04.1-Ubuntu SMP Mon Jun 24 13:21:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

zfs: libzfs2linux 0.6.5.6-0ubuntu24

constantmanish commented 2 years ago

@rincebrain Any comment on this ??

rincebrain commented 2 years ago

Sure, stop running a version from 2016?

At the moment, I think only 2.1.X (first released July 2021) is getting fixes any more, so I'd advise trying that, since 4.15.X is well within the supported kernel versions range.

If you don't want to upgrade for whatever reason, then I'd suggest you report a bug to Ubuntu, as I would place the odds of a bugfix and new release being made by the project at this point for 0.6.5.X at this point to be slim to none - 0.6.5.11, the last in the 0.6.5.X line, was July 2017.

constantmanish commented 2 years ago

@rincebrain Thank you for the information

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.