Closed twarberg closed 6 years ago
@twarberg thanks for reporting this issue. Based on the stacks you provided it appears that somehow one of the global kmalloc-*
slabs (as shown by slabtop
) has been damaged. This resulted in the BUG which both zfs and ext4 encountered as well as several other processes on the system.
Unfortunately anything running in kernel space could have potentially caused this issue and we don't have any clear evidence. All we know is that the l2arc was the first victim. After inspecting the l2arc code I don't see how it could have caused this, but since you have good suspicions let me suggest two things you could try to help us narrow this down.
1) If possible disable the l2arc entirely in your environment to rule it out as a cause, or
2) Continue to use the l2arc and set the spl_kmem_cache_slab_limit
module option to 0
. This will move the memory allocations from the kernel slab implementation to the spl slab implementation which should provide us some additional debug information if the issue repeats.
@behlendorf Not sure what you mean by disabling l2arc. If I don't have the NVMe device attached as a cache on the pool[1] theres no issues (Sorry if I was unclear on that). Unless I misunderstand, that covers no. 1?
I'll try out no. 2 but as described I probably won't have anything to report for 10 days or so.
Clarification on my suspicion With the postgres server it crashed 3 times with approx. 1 week between. First time was one week after the server was provisioned. Second time Google had replaced the boot kernel to a newer build version and I updated to 0.7.2 hoping that would fix it. Between 2nd and 3rd time I had read a issue here that seemed similar to what I've seen that suggested it was L2arc related and a fix was coming in 0.7.3 and in the meanwhile disabling L2arc should do it and when it happened the 3rd time I removed the cache drive and haven't had any issues since then (It's still on 0.7.2). Then decided to give it a new try on 0.7.3 when building a new MySQL server and after 10 days the crash above happened. This time I made sure i copied the console output above.
[1] The NVMe is still attached to both instances because it can't be removed when the instance is running.
@twarberg thanks for the clarification, definitely let us know if you observe it again.
Been running for 24 days on 0.7.4 and spl_kmem_cache_slab_limit=0. Will upgrade to 0.7.5 and without spl_kmem_cache_slab_limit=0 to verify if it was 0.7.4 or spl_kmem_cache_slab_limit that did the change
I believe I'm also affected by this issue. My system information:
Type | Version/Name |
---|---|
Distribution Name | CentOS |
Distribution Version | 7.4 |
Linux Kernel | 3.10.0-693.11.6.el7.x86_64 |
Architecture | x86_64 |
ZFS Version | 0.7.5-1 |
SPL Version | 0.7.5-1 |
I have a NVMe SSD as L2ARC in my pool.
The issue manifests itself after 4 to 6 hours of running pgbench
(in my case I was benchmarking storage options for KVM VMs) in On-disk heavy-contention mode. I have been able to reproduce this crash several times, each time it occured within 6 hours.
dmesg output:
[39351.772947] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[39351.773005] IP: [<ffffffff811e14d4>] kmem_cache_alloc+0x74/0x1e0
[39351.773036] PGD 0
[39351.773048] Oops: 0000 [#1] SMP
[39351.773065] Modules linked in: cfg80211 rfkill xt_nat veth vhost_net vhost macvtap macvlan xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink tun xt_addrtype br_netfilter overlay(T) target_core_user uio target_core_pscsi target_core_file target_core_iblock nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi
[39351.773437] ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core osst st ch ses enclosure zfs(POE) zunicode(POE) zavl(POE) icp(POE) intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) iTCO_wdt spl(OE) mpt2sas iTCO_vendor_support gpio_ich raid_class scsi_transport_sas sg i2c_i801 pcspkr i7core_edac shpchp ioatdma lpc_ich edac_core acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_en sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci mlx4_core drm libahci igb libata crct10dif_pclmul crct10dif_common crc32c_intel
[39351.773857] serio_raw ptp nvme pps_core dca nvme_core i2c_algo_bit devlink i2c_core
[39351.773919] CPU: 22 PID: 767 Comm: arc_reclaim Tainted: P OE ------------ T 3.10.0-693.11.6.el7.x86_64 #1
[39351.773962] Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 03/17/2012
[39351.773988] task: ffff88060e563f40 ti: ffff8800b8f64000 task.ti: ffff8800b8f64000
[39351.774015] RIP: 0010:[<ffffffff811e14d4>] [<ffffffff811e14d4>] kmem_cache_alloc+0x74/0x1e0
[39351.774049] RSP: 0018:ffff8800b8f67c50 EFLAGS: 00010282
[39351.774069] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000e46a7bf
[39351.774095] RDX: 000000000e46a7be RSI: 0000000000004230 RDI: ffff88017fc03a00
[39351.774122] RBP: ffff8800b8f67c80 R08: 000000000001b920 R09: ffffffffc0553319
[39351.774148] R10: 0000000034b27301 R11: ffffea0014d2c980 R12: 0000000000000008
[39351.774174] R13: 0000000000004230 R14: ffff88017fc03a00 R15: ffff88017fc03a00
[39351.774200] FS: 0000000000000000(0000) GS:ffff880627580000(0000) knlGS:0000000000000000
[39351.774230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[39351.774251] CR2: 0000000000000008 CR3: 0000000334814000 CR4: 00000000000227e0
[39351.774278] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[39351.774313] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[39351.774340] Call Trace:
[39351.774368] [<ffffffffc0553319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[39351.774398] [<ffffffffc0553319>] spl_kmem_cache_alloc+0x99/0x150 [spl]
[39351.774473] [<ffffffffc07f8b21>] arc_hdr_realloc+0x31/0x260 [zfs]
[39351.774532] [<ffffffffc07fdb76>] arc_evict_state+0x546/0x880 [zfs]
[39351.774556] [<ffffffffc07fdf57>] arc_adjust_impl.constprop.33+0x37/0x50 [zfs]
[39351.774572] [<ffffffffc07fe1ab>] arc_adjust+0x23b/0x4a0 [zfs]
[39351.774588] [<ffffffffc07ff17d>] arc_reclaim_thread+0xad/0x290 [zfs]
[39351.774603] [<ffffffffc07ff0d0>] ? arc_shrink+0xc0/0xc0 [zfs]
[39351.774610] [<ffffffffc0553fa1>] thread_generic_wrapper+0x71/0x80 [spl]
[39351.774615] [<ffffffffc0553f30>] ? __thread_exit+0x20/0x20 [spl]
[39351.774619] [<ffffffff810b252f>] kthread+0xcf/0xe0
[39351.774620] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
[39351.774623] [<ffffffff816b8798>] ret_from_fork+0x58/0x90
[39351.774624] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
[39351.774635] Code: fc e2 7e 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 20 01 00 00 48 85 c0 0f 84 17 01 00 00 49 63 46 20 48 8d 4a 01 4d 8b 06 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 ba 49 63
[39351.774636] RIP [<ffffffff811e14d4>] kmem_cache_alloc+0x74/0x1e0
[39351.774637] RSP <ffff8800b8f67c50>
[39351.774637] CR2: 0000000000000008
After setting spl_kmem_cache_slab_limit
option to 0
, the system was stable and the benchmark ran for at least 48 hours.
However, after some more time the system become unresponsive. I'm not sure whether it is related to this issue or not:
``` Jan 14 10:57:56.741418 kernel: ------------[ cut here ]------------ Jan 14 10:57:56.778146 kernel: WARNING: CPU: 2 PID: 1046 at lib/list_debug.c:36 __list_add+0x8a/0xc0 Jan 14 10:57:56.780472 kernel: list_add double add: new=ffff880149d1e1c8, prev=ffff880149d1e020, next=ffff880149d1e1c8. Jan 14 10:57:56.780543 kernel: Modules linked in: xt_nat veth vhost_net vhost macvtap macvlan xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netlink xt_addrtype br_netfilter overlay(T) target_core_user uio target_core_pscsi target_core_file target_core_iblock nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi Jan 14 10:57:56.788423 kernel: ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core osst zfs(POE) zunicode(POE) zavl(POE) icp(POE) intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support gpio_ich zcommon(POE) znvpair(POE) ses enclosure spl(OE) st ch ioatdma pcspkr i2c_i801 sg lpc_ich i7core_edac edac_core shpchp acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx4_en mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx4_core ahci libahci drm igb mpt2sas libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw ptp pps_core Jan 14 10:57:56.791461 kernel: nvme raid_class dca nvme_core i2c_algo_bit scsi_transport_sas devlink i2c_core Jan 14 10:57:56.791614 kernel: CPU: 2 PID: 1046 Comm: z_wr_iss Tainted: P OE ------------ T 3.10.0-693.11.6.el7.x86_64 #1 Jan 14 10:57:56.794159 kernel: Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 03/17/2012 Jan 14 10:57:56.796269 kernel: Call Trace: Jan 14 10:57:56.800152 kernel: [] dump_stack+0x19/0x1b Jan 14 10:57:56.800214 kernel: [ ] __warn+0xd8/0x100 Jan 14 10:57:56.800275 kernel: [ ] warn_slowpath_fmt+0x5f/0x80 Jan 14 10:57:56.803681 kernel: [ ] ? sched_clock+0x9/0x10 Jan 14 10:57:56.805661 kernel: [ ] __list_add+0x8a/0xc0 Jan 14 10:57:56.807628 kernel: [ ] __spl_cache_flush+0xb5/0x150 [spl] Jan 14 10:57:56.810723 kernel: [ ] spl_cache_flush+0x36/0x50 [spl] Jan 14 10:57:56.810784 kernel: [ ] spl_kmem_cache_free+0x1c5/0x1e0 [spl] Jan 14 10:57:56.810833 kernel: [ ] arc_hdr_destroy+0x76/0x1c0 [zfs] Jan 14 10:57:56.810885 kernel: [ ] arc_freed+0x69/0xc0 [zfs] Jan 14 10:57:56.810937 kernel: [ ] zio_free_sync+0x45/0x140 [zfs] Jan 14 10:57:56.812111 kernel: [ ] ? dbuf_rele+0x36/0x40 [zfs] Jan 14 10:57:56.812171 kernel: [ ] ? dbuf_destroy+0x280/0x370 [zfs] Jan 14 10:57:56.816312 kernel: [ ] zio_free+0xab/0x110 [zfs] Jan 14 10:57:56.818306 kernel: [ ] dsl_free+0x11/0x20 [zfs] Jan 14 10:57:56.818361 kernel: [ ] dsl_dataset_block_kill+0x267/0x4c0 [zfs] Jan 14 10:57:56.818429 kernel: [ ] dbuf_write_done+0x15a/0x1a0 [zfs] Jan 14 10:57:56.818491 kernel: [ ] arc_write_done+0xa1/0x400 [zfs] Jan 14 10:57:56.821424 kernel: [ ] zio_done+0x321/0xcf0 [zfs] Jan 14 10:57:56.824922 kernel: [ ] ? mutex_unlock+0x1b/0x20 Jan 14 10:57:56.826754 kernel: [ ] ? zio_ready+0x237/0x3f0 [zfs] Jan 14 10:57:56.826813 kernel: [ ] ? zio_write_compress+0x33e/0x6a0 [zfs] Jan 14 10:57:56.826878 kernel: [ ] zio_execute+0x9c/0x100 [zfs] Jan 14 10:57:56.826935 kernel: [ ] taskq_thread+0x2a7/0x4f0 [spl] Jan 14 10:57:56.826993 kernel: [ ] ? wake_up_state+0x20/0x20 Jan 14 10:57:56.828255 kernel: [ ] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs] Jan 14 10:57:56.828320 kernel: [ ] ? taskq_thread_spawn+0x60/0x60 [spl] Jan 14 10:57:56.828381 kernel: [ ] kthread+0xcf/0xe0 Jan 14 10:57:56.828432 kernel: [ ] ? insert_kthread_work+0x40/0x40 Jan 14 10:57:56.828491 kernel: [ ] ret_from_fork+0x58/0x90 Jan 14 10:57:56.828552 kernel: [ ] ? insert_kthread_work+0x40/0x40 Jan 14 10:57:56.829594 kernel: ---[ end trace 0ab1c41144b59ce1 ]--- Jan 14 11:38:22.902202 kernel: NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [l2arc_feed:832] Jan 14 11:38:22.914494 kernel: Modules linked in: xt_nat veth vhost_net vhost macvtap macvlan xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netlink xt_addrtype br_netfilter overlay(T) target_core_user uio target_core_pscsi target_core_file target_core_iblock nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi Jan 14 11:38:22.921368 kernel: ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core osst zfs(POE) zunicode(POE) zavl(POE) icp(POE) intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support gpio_ich zcommon(POE) znvpair(POE) ses enclosure spl(OE) st ch ioatdma pcspkr i2c_i801 sg lpc_ich i7core_edac edac_core shpchp acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mlx4_en mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx4_core ahci libahci drm igb mpt2sas libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw ptp pps_core Jan 14 11:38:22.922724 kernel: nvme raid_class dca nvme_core i2c_algo_bit scsi_transport_sas devlink i2c_core Jan 14 11:38:22.922774 kernel: CPU: 8 PID: 832 Comm: l2arc_feed Tainted: P W OE ------------ T 3.10.0-693.11.6.el7.x86_64 #1 Jan 14 11:38:22.922826 kernel: Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 03/17/2012 Jan 14 11:38:22.922876 kernel: task: ffff88062062af70 ti: ffff8806214b0000 task.ti: ffff8806214b0000 Jan 14 11:38:22.922922 kernel: RIP: 0010:[ ] [ ] spl_slab_reclaim+0x17c/0x220 [spl] Jan 14 11:38:22.922968 kernel: RSP: 0018:ffff8806214b3c48 EFLAGS: 00000246 Jan 14 11:38:22.923015 kernel: RAX: ffff880149d1e1c8 RBX: ffffffff816b9c6f RCX: ffff880149d1e1b0 Jan 14 11:38:22.923062 kernel: RDX: 0000000000000020 RSI: ffff880149d1e1b0 RDI: ffff880622f510b8 Jan 14 11:38:22.923106 kernel: RBP: ffff8806214b3ca8 R08: ffff880149d1f2c8 R09: 0000000000000000 Jan 14 11:38:22.923148 kernel: R10: ffff88060d5ca420 R11: ffff8801685a70b8 R12: ffffffff816b9bff Jan 14 11:38:22.923213 kernel: R13: ffffffff816b9c06 R14: ffffffff816b9c0d R15: ffffffff816b9c14 Jan 14 11:38:22.923290 kernel: FS: 0000000000000000(0000) GS:ffff880627200000(0000) knlGS:0000000000000000 Jan 14 11:38:22.923367 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jan 14 11:38:22.925319 kernel: CR2: 0000000004452000 CR3: 00000000019fa000 CR4: 00000000000227e0 Jan 14 11:38:22.925366 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 14 11:38:22.925432 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jan 14 11:38:22.925505 kernel: Call Trace: Jan 14 11:38:22.925566 kernel: [ ] spl_kmem_cache_free+0x158/0x1e0 [spl] Jan 14 11:38:22.925647 kernel: [ ] arc_hdr_destroy+0x76/0x1c0 [zfs] Jan 14 11:38:22.925722 kernel: [ ] l2arc_evict+0x287/0x310 [zfs] Jan 14 11:38:22.925786 kernel: [ ] l2arc_feed_thread+0x3a4/0xb80 [zfs] Jan 14 11:38:22.925840 kernel: [ ] ? l2arc_evict+0x310/0x310 [zfs] Jan 14 11:38:22.925891 kernel: [ ] thread_generic_wrapper+0x71/0x80 [spl] Jan 14 11:38:22.925942 kernel: [ ] ? __thread_exit+0x20/0x20 [spl] Jan 14 11:38:22.926022 kernel: [ ] kthread+0xcf/0xe0 Jan 14 11:38:22.926103 kernel: [ ] ? insert_kthread_work+0x40/0x40 Jan 14 11:38:22.926168 kernel: [ ] ret_from_fork+0x58/0x90 Jan 14 11:38:22.926233 kernel: [ ] ? insert_kthread_work+0x40/0x40 Jan 14 11:38:22.926291 kernel: Code: c1 02 48 d3 e0 49 89 c4 48 8b 4d c0 4c 8d 73 50 48 8b 01 4c 39 e9 48 8d 71 e8 4c 8d 78 e8 75 22 eb 3d 0f 1f 44 00 00 49 8b 47 18 <49> 8d 57 18 4c 39 ea 48 8d 48 e8 74 27 48 8b 53 50 4c 89 fe 49 ```
None of these happen if I remove the cache device from the pool. So I believe that the issue is still present in 0.7.5, and for me setting spl_kmem_cache_slab_limit
to 0
makes the system more stable, but another (possibly unrelated) issue still manifests itself.
@twarberg @adam-dej would you mind posting the capacity of the cache device you were using and total system memory.
@behlendorf Sure thing MySQL system: 52GB mem, 375GB cache device Postgres: system: 118GB mem, 375GB cache device
Thanks, you might try keeping an eye of the l2_hdr_size
as reported in /proc/spl/kstat/zfs/arcstats
. The larger the l2arc device the more memory it takes to manage it. It looks like your system has more than enough memory this won't be a problem but it is a thing to keep an eye on which might help explain this.
Haven't seen any issues for months. Will close for now
System information
Describe the problem you're observing
We're running on Google Cloud and is trying to use an local NVMe SSD for L2arc but run into the crash after 7-12 days. We've seen it on 2 different systems a MySQL and a Postgres server. It happened on the Postgres server with both 0.7.1 and 0.7.2 with about a weeks interval (2x on 0.7.1 and once on 0.7.2). Unfortunately we do not have console logs for those which is why we haven't reported it until now. We decided to give 0.7.3 a try after upgrading the MySQL server and got a crash after 10 days. Both systems are busy 24/7 with the Postgres being write heavy and the MySQL read heavy. Postgres is at least 2x as disk active. The Postgres server has been running stable without L2arc for months.
Common settings
Describe how to reproduce the problem
Run a busy MySQL or Postgres on Google Cloud instance with local NVMe SSD L2arc for 7-12 days.
Include any warning/errors/backtraces from the system logs