Closed marksumm closed 11 years ago
wtf.
(posting to track this)
Just an update...
arc_adapt has still not learned the error of its ways. It has taken up permanent residence on one of my CPU cores (see below).
Any suggestions about how to find out what it is doing will be gratefully received.
top - 14:14:30 up 4 days, 19:48, 4 users, load average: 1.21, 1.28, 1.33
Tasks: 266 total, 2 running, 264 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 25.1%sy, 0.0%ni, 74.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8177736k total, 6361036k used, 1816700k free, 102212k buffers
Swap: 16777212k total, 720580k used, 16056632k free, 359960k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
443 root 0 -20 0 0 0 R 100 0.0 6197:11 arc_adapt
1525 root 20 0 0 0 0 S 0 0.0 0:00.97 kworker/1:1
2378 mark 20 0 3168m 123m 4556 S 0 1.5 13:03.37 java
A further update... after another random load spike, the server became completely unresponsive yesterday morning.
The stack trace from the kernel was revealing, in that it had nothing to do with ZFS. It was complaining about the Linux md driver, which supports the software RAID1 array I use to contain my OS install.
So, quite possibly, the load spike issue is not relevant, and the question is simply why is arc_adapt spinning?
Also, since I rebooted the server, arc_adapt has already returned to the top of the CPU utilisation list i.e. the problem resurfaced in under 24 hours.
Kernel version is 3.2.0-25-generic (from Ubuntu Precise)
My best guess for what's going on here is that your working meta data set size exceeds the zfs_arc_meta_limit=536870912
you manually set. This results in the arc_adapt thread getting woken up repeatedly to try and free up some of that meta data. Notice how arc_meta_used exceeds arc_meta_limit.
arc_prune 4 37149 arc_meta_used 4 2064458416 arc_meta_limit 4 536870912 arc_meta_max 4 2066623440
First I'd try reverting to the default zfs_arc_meta_limit
and see if that helps, you could alternately try increasing it further. We'll also want to put together a patch to perhaps handle this case better when for whatever reason the memory can't be reclaimed.
Thank you for taking the time to look into this bizarre issue.
I have removed my user-specified zfs_arc_meta_limit.
However, I also moved to CentOS 6, because I simply could not tolerate the instability of SATA drivers provided in Ubuntu. This has been an issue in the past, and it was disappointing to discover that it had resurfaced. Also, the constant goalpost moving of ever-newer kernel versions was not helping matters.
In summary...
Good news: Everything seems stable and arc_adapt is not spinning
Bad news: I have changed two variables, which is bad experimentation
Oh the irony!
After I posted my last update, I checked dmesg, and found this. Note that it does not appear to have caused any instability.
z_wr_iss/1: page allocation failure. order:0, mode:0x20
Pid: 1359, comm: z_wr_iss/1 Tainted: P ---------------- 2.6.32-220.23.1.el6.x86_64 #1
Call Trace:
<IRQ> [<ffffffff8112415f>] ? __alloc_pages_nodemask+0x77f/0x940
[<ffffffff8115e152>] ? kmem_getpages+0x62/0x170
[<ffffffff8115ed6a>] ? fallback_alloc+0x1ba/0x270
[<ffffffff8115e7bf>] ? cache_grow+0x2cf/0x320
[<ffffffff8115eae9>] ? ____cache_alloc_node+0x99/0x160
[<ffffffff814220aa>] ? __alloc_skb+0x7a/0x180
[<ffffffff8115f9af>] ? kmem_cache_alloc_node_notrace+0x6f/0x130
[<ffffffff8115fbeb>] ? __kmalloc_node+0x7b/0x100
[<ffffffff814220aa>] ? __alloc_skb+0x7a/0x180
[<ffffffff81422226>] ? __netdev_alloc_skb+0x36/0x60
[<ffffffffa050c1b2>] ? e1000_clean_rx_irq+0x392/0x530 [e1000e]
[<ffffffffa050b5fe>] ? e1000_clean+0x7e/0x2b0 [e1000e]
[<ffffffff8105679c>] ? scheduler_tick+0xcc/0x260
[<ffffffff81431013>] ? net_rx_action+0x103/0x2f0
[<ffffffff81072291>] ? __do_softirq+0xc1/0x1d0
[<ffffffff810d9740>] ? handle_IRQ_event+0x60/0x170
[<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
[<ffffffff8100de85>] ? do_softirq+0x65/0xa0
[<ffffffff81072075>] ? irq_exit+0x85/0x90
[<ffffffff814f5515>] ? do_IRQ+0x75/0xf0
[<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
<EOI> [<ffffffffa0378323>] ? fletcher_4_native+0x23/0x60 [zcommon]
[<ffffffffa0429210>] ? zio_checksum_compute+0xd0/0x160 [zfs]
[<ffffffffa0423ac0>] ? zio_checksum_generate+0x30/0x60 [zfs]
[<ffffffffa0427959>] ? zio_execute+0x99/0xf0 [zfs]
[<ffffffffa02c37b2>] ? taskq_thread+0x212/0x590 [spl]
[<ffffffff814ed250>] ? thread_return+0x4e/0x76e
[<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
[<ffffffffa02c35a0>] ? taskq_thread+0x0/0x590 [spl]
[<ffffffff810909c6>] ? kthread+0x96/0xa0
[<ffffffff8100c14a>] ? child_rip+0xa/0x20
[<ffffffff81090930>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
That's good news. As for the stack you posted that's a known issue which is being worked and it's harmless. In your exact case it resulted in a network packet getting dropped but TCP will resend so there was no harm done.
Indeed it is!
Also, there is a conspicuous absence of ever-increasing memory utilisation.
Given that I was on a 3.2 kernel, I'm guessing that was related to either #466 or #618
Ok so I'm pretty sure I just ran into this on my Ubuntu fileserver running kernel 3.2.0-26-generic #41-Ubuntu SMP.
I'm running the ZFS daily PPA, which I update regularly. I had the same problem with arc_adapt spinning at 100% utilizing, and it looks like it was probably to do with arc_meta_limit (which I've left at defaults). After reboot (with all the usual things running) the meta_usage is sitting close to the limit:
arc_meta_used 4 33538896
arc_meta_limit 4 33554432
arc_meta_max 4 44593832
I'm guessing I should just bump up arc_meta_limit, although I don't actually know what it does.
The arc_meta_limit attempts to put a bound on the amount of cached meta data in the ARC. Basically this is anything that's not bulk file data but still needs to be cached. Things like inodes. Anyway, the limit is there to attempt to keep a reasonable balanve between the file data in this meta data. If your hitting this limit regularly you likely have a large working set of files and it may make sense to increase it.
Ok I just had this bug lock up my server again. It happened right as I ran Unison against a fairly large directory full of large files. The behavior was the same - arc_adapt started spinning at 100+% in top, and filled up the memory until it hit the swap at which point the server became unresponsive.
I have zfs_arc_max=0x8000000
in zfs.conf under modprobe.d, which should be a very low amount of memory but it doesn't seem to be respected at all. At any rate I don't know how it managed to balloon from that figure to over 8gb.
Is there something I can do to help debug this, since it doesn't seem reasonable that metadata operations should lock up the entire server - I'd expect simply to have slower file lists, data searches.
I'm currently seeing this on a system, Ubuntu 12.04 64 3.2.0-27-generic, zfs 0.6.0.65. It occurred after a rsync and snapshot. The system has 2 CPUs, 2GB ram, 16G USB flash l2arc. I'll leave it spinning for a couple of days if anyone wants me to run anything.
Looking at a "perf" callgraph it seems to be spinning inside iterate_supers_type(), not even calling zpl_prune_sb() from what I can tell (pasted below). It's as if the fs_supers list has become looped. In the 3.3 kernel the commit dabe0dc194d5d56d379a8994fff47392744b6491 looks like it fixes that kind of problem - maybe that's the issue? (I think there are some related previous commits too).
"perf record -g -p <pid_of_arc_adapt>":
# Overhead Command Shared Object Symbol
# ........ ......... ................. .....................
#
27.05% arc_adapt [kernel.kallsyms] [k] down_read
|
--- down_read
|
|--91.69%-- iterate_supers_type
| zpl_prune_sbs
| arc_adjust_meta
| arc_adapt_thread
| thread_generic_wrapper
| kthread
| kernel_thread_helper
|
--8.31%-- zpl_prune_sbs
arc_adjust_meta
arc_adapt_thread
thread_generic_wrapper
kthread
kernel_thread_helper
24.63% arc_adapt [kernel.kallsyms] [k] up_read
|
--- up_read
|
|--65.94%-- iterate_supers_type
| zpl_prune_sbs
| arc_adjust_meta
| arc_adapt_thread
| thread_generic_wrapper
| kthread
| kernel_thread_helper
|
--34.06%-- zpl_prune_sbs
arc_adjust_meta
arc_adapt_thread
thread_generic_wrapper
kthread
kernel_thread_helper
24.57% arc_adapt [kernel.kallsyms] [k] __ticket_spin_lock
|
--- __ticket_spin_lock
|
|--68.54%-- _raw_spin_lock
| iterate_supers_type
| zpl_prune_sbs
| arc_adjust_meta
| arc_adapt_thread
| thread_generic_wrapper
| kthread
| kernel_thread_helper
|
--31.46%-- iterate_supers_type
zpl_prune_sbs
arc_adjust_meta
arc_adapt_thread
thread_generic_wrapper
kthread
kernel_thread_helper
10.50% arc_adapt [kernel.kallsyms] [k] iterate_supers_type
|
--- iterate_supers_type
zpl_prune_sbs
arc_adjust_meta
arc_adapt_thread
thread_generic_wrapper
kthread
kernel_thread_helper
4.49% arc_adapt [kernel.kallsyms] [k] __put_super
|
--- __put_super
|
|--64.70%-- iterate_supers_type
| zpl_prune_sbs
| arc_adjust_meta
| arc_adapt_thread
| thread_generic_wrapper
| kthread
| kernel_thread_helper
|
--35.30%-- zpl_prune_sbs
arc_adjust_meta
arc_adapt_thread
thread_generic_wrapper
kthread
kernel_thread_helper
3.86% arc_adapt [kernel.kallsyms] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|--61.70%-- iterate_supers_type
| zpl_prune_sbs
| arc_adjust_meta
| arc_adapt_thread
| thread_generic_wrapper
| kthread
| kernel_thread_helper
|
--38.30%-- zpl_prune_sbs
arc_adjust_meta
arc_adapt_thread
thread_generic_wrapper
kthread
kernel_thread_helper
2.81% arc_adapt [kernel.kallsyms] [k] _cond_resched
|
--- _cond_resched
|
|--58.07%-- down_read
| iterate_supers_type
| zpl_prune_sbs
| arc_adjust_meta
| arc_adapt_thread
| thread_generic_wrapper
| kthread
| kernel_thread_helper
|
--41.93%-- iterate_supers_type
zpl_prune_sbs
arc_adjust_meta
arc_adapt_thread
thread_generic_wrapper
kthread
kernel_thread_helper
2.09% arc_adapt [kernel.kallsyms] [k] __ticket_spin_unlock
|
--- __ticket_spin_unlock
|
|--72.47%-- zpl_prune_sbs
| arc_adjust_meta
| arc_adapt_thread
| thread_generic_wrapper
| kthread
| kernel_thread_helper
|
--27.53%-- iterate_supers_type
zpl_prune_sbs
arc_adjust_meta
arc_adapt_thread
thread_generic_wrapper
kthread
kernel_thread_helper
0.00% arc_adapt [kernel.kallsyms] [k] native_write_msr_safe
|
--- native_write_msr_safe
intel_pmu_enable_all
x86_pmu_enable
perf_pmu_enable
x86_pmu_commit_txn
group_sched_in
__perf_event_enable
remote_function
generic_smp_call_function_single_interrupt
smp_call_function_single_interrupt
call_function_single_interrupt
_raw_spin_lock
iterate_supers_type
0xffffffffa02cdc9f
arc_adjust_meta
arc_adapt_thread
0xffffffffa01970b8
kthread
kernel_thread_helper
Interesting, the additional profiling makes this look like a duplicate of #861 which was recently filed. Somehow we're getting caught on the sb_lock while iterating over the super blocks. The kernel commit you referenced certainly looks relevant we'll need to review it carefully.
In the meanwhile, if you need a work around you could #undef HAVE_SHRINK in zfs_config.h and rebuild zfs until we exactly determine what's going on here.
Looking at my system it seems that the bug occurred while unmounting a .zfs/snapshot dir (it's still mounted but nothing using it), matching #861.
I think there's still a bug in iterate_supers_type() in head kernels. In generic_shutdown_super() it calls hlist_del_init() on s_instances, but don't think there's anything preventing it being the current iterator in iterate_supers_type().
Possibly deleting s_instances needs to be moved to __put_super(), like s_list. (And other changes made to things that look at s_instances being empty).
I'm still experiencing this issue. It starts at overnight, when I leave the computer turned on, but no user is logged in. I have Ubuntu 12.04 64bit 8GB RAM and I boot from ZFS as well. I've seen this from 0.6.0.82-rc11 through 0.6.0.85-rc11. I did some profiling which results are:
- 28.81% arc_adapt [kernel.kallsyms] [k] __ticket_spin_lock
- __ticket_spin_lock ▒
- 96.26% _raw_spin_lock ▒
- 99.99% iterate_supers_type ▒
zpl_prune_sbs ▒
arc_adjust_meta ▒
arc_adapt_thread ▒
thread_generic_wrapper ▒
kthread ▒
kernel_thread_helper ▒
+ 3.72% iterate_supers_type ▒
- 23.54% arc_adapt [kernel.kallsyms] [k] down_read ▒
- down_read ▒
- 95.26% iterate_supers_type ▒
zpl_prune_sbs ▒
arc_adjust_meta ▒
arc_adapt_thread ▒
thread_generic_wrapper ▒
kthread ◆
kernel_thread_helper ▒
+ 4.74% zpl_prune_sbs ▒
- 21.83% arc_adapt [kernel.kallsyms] [k] up_read ▒
- up_read ▒
- 95.47% iterate_supers_type ▒
zpl_prune_sbs ▒
arc_adjust_meta ▒
arc_adapt_thread ▒
thread_generic_wrapper ▒
kthread ▒
kernel_thread_helper ▒
+ 4.53% zpl_prune_sbs
This will be fixed with proper page cache integration. Until then I'd suggest making your .zfs snapshot directory invisible so it's less likely to be traversed by processes walking the file system. This will cut down on the number of mount/umounts and make the issue less likely.
Is this issue fixed?
@Rudd-O No, but you can likely work around it by building zfs with HAVE_SHRINK undefined.
@Rudd-O Pull request #1595 contains a fix if you still need one. However, upon closer inspection this bug only impacts 3.0-3.2 kernels and since I know you're usually running something newer I'm surprised you hit it. Can you verify you were using one of the impacted kernels when you hit this?
I'm running ZFS 0.6.0.65 (kernel module) on Ubuntu 12.04 64-bit.
With this version of ZFS, and previous versions, I have the same issue, which I shall try to describe...
A few hours after booting the server (usually the next day), the load average reported will be insanely high (3600 is my personal best) for a period of time.
This situation resolves itself without any intervention from me. However, it leaves behind an arc_adapt process, which consumes 100% of one core, until I reboot the server.
Having read that arc_adapt doesn't have much responsibility these days, I am at a loss to explain this behaviour.
Annoyingly, I am unable to attach gdb or strace to the arc_adapt kernel space process, in order to further diagnose the issue.
Here is some further information:
RAM: 8GB
ZFS configuration: RAIDZ2 comprised of 8x2TB and 8x1TB drives