Closed cwedgwood closed 11 years ago
This might be more meaningfull:
[ 208.481594] BUG: soft lockup - CPU#9 stuck for 23s! [z_wr_iss/9:2337]
[ 208.488405] Modules linked in: zfs(O)[ 208.491582] BUG: soft lockup - CPU#10 stuck for 23s! [z_wr_iss/10:2338]
[ 208.491583] Modules linked in: zfs(O) zunicode(O) zavl(O) zcommon(O) znvpair(O) spl(O) sg coretemp kvm_intel i2c_i801 lpc_ich mfd_core kvm i2c_core mpt2sas ghash_clmulni_intel microcode i7core_edac
[ 208.491596] CPU 10 Modules linked in: zfs(O) zunicode(O) zavl(O) zcommon(O) znvpair(O) spl(O) sg coretemp kvm_intel i2c_i801 lpc_ich mfd_core kvm i2c_core mpt2sas ghash_clmulni_intel microcode i7core_edac
[ 208.491605]
[ 208.491607] Pid: 2338, comm: z_wr_iss/10 Tainted: G O 3.5.3-cw3 #5 Supermicro X8DTL/X8DTL
[ 208.491610] RIP: 0010:[
That's a new one. What version of the SPL/ZFS were you using?
commit 2b2861362f7dd09cc3167df8fddb6e2cb475018a Author: Brian Behlendorf behlendorf1@llnl.gov Date: Wed Aug 29 11:52:47 2012 -0700
Should I git pull and retest? It's as simple as create a pool with a single (in this case 1TB SATA) disk, create a zvol then copy data to it.
I just wanted to make sure it was something current. This may be an issue with Linux 3.5.3 we'll have to investigate further, but this code has been working well with older kernels for a while now.
Should I retest with 3.4.x or older still?
Yes 3.4 would be a good start. I regularly test with zvols and this kernel so if it doesn't work then we'll have to figure out what's going on in your environment.
3.4.10 works flawlessly under the same conditions. I guess this is a 3.5.x issue :-(
That at least narrows down where to look. There must have been another API change in the 3.5 kernel.
On 3.4.10 this happens as well:
[ 555.735918] Pid: 3422, comm: z_wr_iss/1 Tainted: G O 3.4.10-cw4-00001-g23bfbbf #7 Supermicro X8DTL/X8DTL
[ 555.735922] RIP: 0010:[<ffffffffa001a0ae>] [<ffffffffa001a0ae>] spl_kmem_cache_alloc+0x19a/0xa9d [spl]
[ 555.735934] RSP: 0018:ffff88062eadbca0 EFLAGS: 00000246
[ 555.735936] RAX: 0000000000000000 RBX: ffffffff814e129d RCX: ffff88062f896df8
[ 555.735938] RDX: ffff88062f896de8 RSI: 0000000000000230 RDI: ffff88062f896e18
[ 555.735940] RBP: ffff88062eadbd80 R08: 0000000000000000 R09: ffff8803301172b8
[ 555.735942] R10: 0000000000000000 R11: ffff880330117200 R12: ffff88062eadbc18
[ 555.735944] R13: 0000000000000046 R14: ffff88062eada000 R15: 0000000000000000
[ 555.735947] FS: 0000000000000000(0000) GS:ffff880333c20000(0000) knlGS:0000000000000000
[ 555.735949] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 555.735951] CR2: ffffffffff600400 CR3: 000000000174d000 CR4: 00000000000007e0
[ 555.735953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 555.735955] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 555.735957] Process z_wr_iss/1 (pid: 3422, threadinfo ffff88062eada000, task ffff88062eae0000)
[ 555.735959] Stack:
[ 555.735960] ffff88062eadbcd0 ffffffffa01c39ce ffff88031558fa60 0000000000020000
[ 555.735965] ffff880330d7d800 0000000000000000 ffff88062eadbd10 ffff88062eae0000
[ 555.735969] ffff88062f896df8 ffff88062eadbd40 0000000000000206 ffff88062f896d60
[ 555.735973] Call Trace:
[ 555.736020] [<ffffffffa01c39ce>] ? zio_vdev_io_done+0x3a/0x143 [zfs]
[ 555.736046] [<ffffffffa0197644>] ? vdev_config_sync+0x137/0x137 [zfs]
[ 555.736071] [<ffffffffa018c4b5>] ? spa_config_enter+0xc4/0xde [zfs]
[ 555.736100] [<ffffffffa01c463d>] zio_buf_alloc+0x22/0x24 [zfs]
[ 555.736131] [<ffffffffa01c4826>] zio_write_bp_init+0x19b/0x3da [zfs]
[ 555.736161] [<ffffffffa01c412d>] zio_execute+0xb8/0xe0 [zfs]
[ 555.736168] [<ffffffffa001c1a2>] taskq_thread+0x2aa/0x4d2 [spl]
[ 555.736174] [<ffffffff8105e46c>] ? finish_task_switch+0x82/0xad
[ 555.736177] [<ffffffff81062757>] ? try_to_wake_up+0x1d4/0x1d4
[ 555.736184] [<ffffffffa001bef8>] ? task_done+0x110/0x110 [spl]
[ 555.736189] [<ffffffff8105496a>] kthread+0x8b/0x93
[ 555.736194] [<ffffffff814e8b74>] kernel_thread_helper+0x4/0x10
[ 555.736198] [<ffffffff814e129d>] ? retint_restore_args+0x13/0x13
[ 555.736201] [<ffffffff810548df>] ? kthread_worker_fn+0x149/0x149
[ 555.736204] [<ffffffff814e8b70>] ? gs_change+0x13/0x13
As before it's so bad I have to reset the machine hard.
@behlendorf it seems this exists in 3.4.x as well, though 3.5.x triggers it more easily
i'm guessing it's a low memory / memory reclaim issue
to trigger this you need to write more data than there is memory in the system (1.5x suffices in my case)
cat path/to/vm/image > /dev/zvol/...
if you avoid memory pressure by doing:
cat path/to/vm | dd of=/dev/zvol/... iflag=fullblock bs=64M oflag=direct
or
echo 1 > /proc/sys/vm/drop_caches
# (during write back the problem never seems to trigger)
if the arc size is limited to under half the total memory of the system things behave much better, the suggestion by @byteharmony in issue #944 (https://github.com/zfsonlinux/zfs/issues/944#issuecomment-8373994) seems to help greatly here
i wonder if the logic that arc can be 1/2 the system memory is a bit agressive on a dual socket (numa) system because it's possible (for whatever reason) to exhast all the memory in a node; it might be more reasonable to limit the arc size to some fraction of the smallest node
if you're testing this on a large system and can't make it show up, try allocating a bunch of large (2M) pages in each node ot artificially constrain the amount of generically available memory
fe:
# reserver 32B on each node, 64GB in total on 2-socket system
echo $((32*512)) > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
echo $((32*512)) > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
(adjust as needed)
@cwedgwood Interesting. Have you looked at the maximum number of emergency objects kicked in? You can check this in lat few columns of /proc/spl/kmem/slab
. I did some some rough spots which the system would appear to hang when there's a sudden massive demand for these objects and memory is already low. However, in all my testing it did manage to work itself out in a few minutes.
There is a lot of info in that proc file, What are we looking for there?
My KVM test / devel environment has yet to fail on anything (Hyper-V didn't make it out of the gates and physical servers apear to be second best). I've now got serial configured for physical boxes. I have one I'm detailing crash info for you on right now. (Need to do it again, the 200 line limit isn't quite long enough ;).
BK
In particular I'm interested in the last three columns of the zio_buf_131072 and zio_data_buf_131072 rows. They will show the worst case usage for these large allocations. I'm wondering if the need for them spiked at some point and we have several 100 or 1000 outstanding.
@behlendorf updated to rc11 and it still happens
i have:
I get 1000s of traces like:
[ 529.026866] [
that crush the machine and require a reset
watching the spl kmem stats until this happes (one second intervals):
zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0
zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0
zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0-
zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0-
zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0-
zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0-
zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0
zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0
zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0
zio_buf_131072 0x00040 4194304 131072 4194304 131072 1 1 97 31 1 3001 0 0 zio_data_buf_131072 0x00040 0 0 4194304 131072 0 0 1 0 0 8 0 0
Adding to this.
It's trivial to break things with a local XFS filesystem on a ZVOL and some moderate write pressure. Typically things break in seconds but minutes at most.
This actually isn't even a contrived situation, I use ZVOLs for testing/development often and have basically has to abandon this as it's unusable.
@cwedgwood Have you tried explicitly setting both the ZVOL block size and XFS block size to 4k. This should result in the easiest workload for the Linux VM and VFS. If the block sizes are mismatched it will result in lots of extra read+modify+write operations which might cause performance problems.
@behlendorf yes, I explicitly use 4K blocks in all cases
@behlendorf @ryao
testing with numa disabled (that is disabled in the chipset, so the ram is interleaved between nodes on cacheline boundaries, there is no SRAT, etc.. everything is flat)
it still beaks, hard, exactly the same way
that is when memory is low it wedges up solid
watching /proc/meminfo the last output i get before it dies
Wed Oct 3 09:49:09 PDT 2012
----
MemTotal: 24751420 kB
MemFree: 236368 kB
Buffers: 8804 kB
Cached: 15989516 kB
SwapCached: 0 kB
Active: 56876 kB
Inactive: 15963808 kB
Active(anon): 22016 kB
Inactive(anon): 1824 kB
Active(file): 34860 kB
Inactive(file): 15961984 kB
with most stack traces in the form of:
[ 727.598257] Call Trace:
[ 727.598259] [<ffffffffa0b6ed56>] ? zio_vdev_io_done+0x3a/0x143 [zfs]
[ 727.598309] [<ffffffffa0b42824>] ? vdev_config_sync+0x134/0x134 [zfs]
[ 727.598335] [<ffffffffa0b375fd>] ? spa_config_enter+0xc4/0xde [zfs]
[ 727.598360] [<ffffffffa0b6f9c5>] zio_buf_alloc+0x22/0x24 [zfs]
[ 727.598390] [<ffffffffa0b6fbae>] zio_write_bp_init+0x19b/0x3d9 [zfs]
[ 727.598421] [<ffffffffa0b6f4b5>] zio_execute+0xb8/0xe0 [zfs]
[ 727.598451] [<ffffffffa080d467>] taskq_thread+0x2aa/0x4d2 [spl]
[ 727.598458] [<ffffffff8105fe03>] ? finish_task_switch+0x82/0xad
[ 727.598463] [<ffffffff8106408a>] ? try_to_wake_up+0x1d4/0x1d4
[ 727.598468] [<ffffffffa080d1bd>] ? task_done+0x110/0x110 [spl]
[ 727.598474] [<ffffffff81055c61>] kthread+0x6f/0x77
[ 727.598478] [<ffffffff814f4934>] kernel_thread_helper+0x4/0x10
[ 727.598483] [<ffffffff814ecfdd>] ? retint_restore_args+0x13/0x13
[ 727.598487] [<ffffffff81055bf2>] ? kthread_freezable_should_stop+0x43/0x43
[ 727.598489] [<ffffffff814f4930>] ? gs_change+0x13/0x13
@behlendorf given how easy this is to break would it be useful to reproduce this in a kvm vm/instance which i can attach to the nearest pigeon heading you way?
similar/'related issue on the list:
https://groups.google.com/a/zfsonlinux.org/forum/?fromgroups=#!topic/zfs-discuss/o26bGiNQ2z0
@cwedgwood OK, I give. I'll see if I can recreate it. If I can do that, it'll be a lot easier to kill. Can you post a trivial recipe for recreating this including:
Try echo 3 >/proc/sys/vm/drop_caches. It's work for me on xen dom0. I use zvols for xen domU but sometimes zfs locks happen.
I'm also having this problem (I believe); you can see my conversation with Fajar in zfs-discuss about it here:
https://groups.google.com/a/zfsonlinux.org/forum/?fromgroups=#!topic/zfs-discuss/o26bGiNQ2z0
I see this problem in Gentoo (running Linux 3.4.9) and CentOS 6.3 (running Linux 2.6.32-279.9.1.el6) in (at least) versions 0.6.0-rc10 and 0.6.0-rc11 of spl/zfs. With any level of zfs_arc_max restriction (as much as 50% of RAM; 4GB in one case, 16GB in another, and down to 512MB), persistent writing to zvols drives the memory usage up until the system softlocks. The CentOS machine managed to kill arc_adapt and blkid at one point, there's a log here:
Let me know what else I can give you.
@drukargin The most useful thing you could post would be a trivial reproducer for CentOS 6.3 which I can run in a VM. If I can reproduce the issue with a ZVOL in a VM then there's a good chance we can make short work of it.
As an update:
I'm having trouble reproducing the failure in a CentOS VM. It's possible the failure in my CentOS box was something similar to #1036, as it has been greatly helped since I tweaked zfs_arc_max down to about 1/3 of the RAM after my VM commitments.(32 GB total, 8 GB VMs, 8 GB zfs_arc_max).
The Gentoo box still has the problem; I'm going to try a few things on it today, and rebuild the my VM with Gentoo to see if I can reproduce it there.
Further update: I tried downgrading the Gentoo box from 3.4.9 to 3.3.8 to 3.2.12; all continued to exhibit the same problem. I booted up a CentOS LiveCD, installed zfs and ran my test case with no problems. I'll retask my VM to work with Gentoo, and see if I can make you a case (I'll give you the root OS so you don't need to build it yourself).
Rereading this whole thread I see this is an issue with kernel 3.x. I'm only on 2.6 so I'm not of much help here. When centos moves to 3.x I'll be in the game!
I have no issues using dd to move data on zvol with 2.6
BK
@cwedgwood OK, I think I've been able to reproduce this issue and I'm starting to dig in to it. It certainly looks like some sort of thrashing in the VM as you described. However, I'm only able to recreate the problem when I use a small zvol block size such, like the default 8K, when I increase the block size to 128K it's much better behaved. Does this match your experience?
I have identified the same issue, https://groups.google.com/a/zfsonlinux.org/forum/?fromgroups=#!searchin/zfs-discuss/lockups/zfs-discuss/yxi45H3kywQ/mws1rqdOicwJ
@mauricev I'm actually looking at this issue now. And despite @cwedgwood's ability to hit it easily I'm not having much luck reproducing it. I can certainly make the VM work hard and get relatively poor I/O performance but nothing resembling an actually failure. Can you describe your hardware (cpus, memory, disk), exact test case (zvol block size, workload), and what it looks like when you get a failure?
You can try setting up a Windows system with a hard drive filled with a few hundred gigs. I set up Gentoo Linux (given 4 cores, 8192MB for RAM and kernel 3.4.10, although you can also use 3.6.1) in VMWare Fusion and I created three 200 GB vmdks, split into 2 GB files and I created a raidz pool with them. Then I created a zvol with the most space, 375G. I then exported this out using the built-in iSCSI (with targetcli). Then on the Windows W2K8sp1) system, I copy the filled hard drive to the zvol using a program HDClone. It runs for a short while and then HDClone begins reporting errors and soft-lockups start. Unfortunately, the kernel doesn't give any more debugging info and the ssh console is locked up.
My Mac is a Mac Pro with dual quad-cores 2.8 Ghz and the W2K8 is a VM running on ESXi 4.1 (dual quad core 2.93 GHz). It's assigned 2 cores and 4.5 GB RAM.
Oddly, I just tried a pool of one drive (with volblocksize=128k to avoid the lockup) and the performance was even worse, barely making a 1MB/second.
So to make sure I understand, you are using ZOL to share out an iSCSI LUN which is having data pushed onto it via the HDClone program which is sending the iSCSI mounted ZOL Lun on the windows machine.
We do something very similar with windows backups sent to an iSCSI LUN on CENTOS6.3 using ZOL. No stability problems. It’s possible your problem lies between your ISCSI target software and ZOL.
My .02 BK
Yep, that it. If I use volblocksize=128K, it runs fine, but very slowly. I don't yet know whether the slowness is due to zfs, iSCSI or somewhere else in the path.
I’ve attempted to do work with the vol block size in windows to see what is fastest with different speed tests, but the layers of software don’t allow for effective testing (iSCSI and windows). I may have a new way to do better testing. I connect the ZVOL direct to a KVM linux based guest OS. Haven’t done the tests yet though... Will post results when I get around to that.
A few things to consider: NTFS block size: a 64K size would make sense on a 128K block size. Perhaps doing 64k? and 64k?
BK
For those chasing this issue you could try running with the following branch. It includes the following patches:
zpool events -v
. This should mostly resolve the console noise issue which only compounds the problem./proc/spl/kstat/zfs/txg-<pool>
file which shows detailed information about the life cycle of each TXG. Useful for seeing what's really going on.https://github.com/behlendorf/spl/branches/stats https://github.com/behlendorf/zfs/branches/stats
At the moment I'm unable to recreate anything more serious than a sluggish performance on my test system.
After several hours of making copies of the kernel source tree in an xfs file system layered on top of a zvol with 4k block size I was able to make my test system quite laggy. Performance to the pool was very bad and I was seeing kswapd and arc_adapt take a large amount of cpu time.
I was never able to produce any "blocked thread" warnings, but I did profile the system (using oprofile) and determined that all the cpu time was going to spl_emergency_free
and spl_emergency_alloc
. This behavior is consistent with the original backtraces posted by @cwedgwood.
The profile points to an issue which I was originally concerned about when writing the emergency slab object code, commit e2dcc6e2b818d6f996203c9f1b4dfaf449554790. However, at the time I wasn't able to come up with a real workload which caused the problem so I opted against prematurely optimizing for this case. The issue is that all the emergency objects which get created are tracked on a single list protected by a spin lock. Now as long as we never create to many of these objects it's not an issue. And really these should only be created under unlikely circumstances. However, in this test case that's just not true.
My system wracked up ~150,000 of these objects as arc buf headers which are quite long lived. That drug performance way down and could easily lead to starvation the more CPUs you have contending on this lock. The good news is that I put a patch together months ago to improve this by moving the emergency objects to a hash. As I said, I just never merged it because it made the code more complicated and I wasn't 100% sure we needed it.
https://github.com/behlendorf/spl/tree/slab_hash
I've merged that patch in to the previous "stats" branches and I expect it will help but I haven't had a chance to seriously test it just yet. If someone gets a chance to that would be great. What you want to look for is that the cpu usage drops considerably and the max depth show in /proc/spl/kmem/slab
is some small value, particularly for the arcbuf* slabs.
You might want to hold of testing the slab hash patch. It looks like it needs a little more polish I managed to crash my system with it twice. I'll try and run down any lingering issues this weekend.
@cwedgwood I have a set of patches which should significantly improve your use case. They resolve a number of performance issues in the slab implementation which were causing the system to thrash. Please try the code on the following branches and let me know.
https://github.com/behlendorf/spl/branches/stats https://github.com/behlendorf/zfs/branches/stats
@behlendorf it seems a lot better, but with enough effort on the serial console:
[ 2504.454915] BUG: soft lockup - CPU#0 stuck for 22s! [z_wr_iss/0:27861]
[ 2504.464903] BUG: soft lockup - CPU#1 stuck for 22s! [z_wr_iss/1:27862]
[ 2504.474888] BUG: soft lockup - CPU#2 stuck for 22s! [z_wr_iss/2:27863]
[ 2504.494857] BUG: soft lockup - CPU#3 stuck for 22s! [z_wr_iss/3:27864]
[ 2504.664594] BUG: soft lockup - CPU#4 stuck for 22s! [z_wr_iss/4:27865]
[ 2504.674576] BUG: soft lockup - CPU#5 stuck for 22s! [z_wr_iss/5:27866]
[ 2504.684565] BUG: soft lockup - CPU#6 stuck for 22s! [z_wr_iss/6:27867]
[ 2504.704533] BUG: soft lockup - CPU#7 stuck for 22s! [z_wr_iss/7:27868]
[ 2504.714516] BUG: soft lockup - CPU#8 stuck for 22s! [z_wr_iss/8:27869]
[ 2504.724502] BUG: soft lockup - CPU#9 stuck for 22s! [z_wr_iss/9:27870]
[ 2504.744471] BUG: soft lockup - CPU#10 stuck for 22s! [z_wr_iss/10:27871]
[ 2504.754456] BUG: soft lockup - CPU#11 stuck for 22s! [z_wr_iss/11:27872]
[ 2504.764444] BUG: soft lockup - CPU#12 stuck for 22s! [z_wr_iss/12:27873]
[ 2504.784413] BUG: soft lockup - CPU#13 stuck for 22s! [z_wr_iss/13:27874]
[ 2504.794398] BUG: soft lockup - CPU#14 stuck for 22s! [z_wr_iss/14:27875]
[ 2504.804379] BUG: soft lockup - CPU#15 stuck for 22s! [z_wr_iss/15:27876]
[ 2532.412063] BUG: soft lockup - CPU#0 stuck for 22s! [z_wr_iss/0:27861]
[ 2532.422044] BUG: soft lockup - CPU#1 stuck for 22s! [z_wr_iss/1:27862]
[ 2532.432029] BUG: soft lockup - CPU#2 stuck for 22s! [z_wr_iss/2:27863]
[ 2532.451997] BUG: soft lockup - CPU#3 stuck for 22s! [z_wr_iss/3:27864]
[ 2532.621740] BUG: soft lockup - CPU#4 stuck for 22s! [z_wr_iss/4:27865]
[ 2532.631726] BUG: soft lockup - CPU#5 stuck for 22s! [z_wr_iss/5:27866]
[ 2532.641709] BUG: soft lockup - CPU#6 stuck for 22s! [z_wr_iss/6:27867]
[ 2532.661679] BUG: soft lockup - CPU#7 stuck for 22s! [z_wr_iss/7:27868]
[ 2532.671666] BUG: soft lockup - CPU#8 stuck for 22s! [z_wr_iss/8:27869]
[ 2532.681649] BUG: soft lockup - CPU#9 stuck for 22s! [z_wr_iss/9:27870]
[ 2532.701615] BUG: soft lockup - CPU#10 stuck for 22s! [z_wr_iss/10:27871]
[ 2532.711599] BUG: soft lockup - CPU#11 stuck for 22s! [z_wr_iss/11:27872]
[ 2532.721585] BUG: soft lockup - CPU#12 stuck for 22s! [z_wr_iss/12:27873]
[ 2532.741557] BUG: soft lockup - CPU#13 stuck for 22s! [z_wr_iss/13:27874]
[ 2532.751542] BUG: soft lockup - CPU#14 stuck for 22s! [z_wr_iss/14:27875]
[ 2532.761527] BUG: soft lockup - CPU#15 stuck for 22s! [z_wr_iss/15:27876]
[ 2560.369203] BUG: soft lockup - CPU#0 stuck for 22s! [z_wr_iss/0:27861]
[ 2560.379187] BUG: soft lockup - CPU#1 stuck for 22s! [z_wr_iss/1:27862]
[ 2560.389175] BUG: soft lockup - CPU#2 stuck for 22s! [z_wr_iss/2:27863]
[ 2560.409146] BUG: soft lockup - CPU#3 stuck for 22s! [z_wr_iss/3:27864]
[ 2560.578884] BUG: soft lockup - CPU#4 stuck for 22s! [z_wr_iss/4:27865]
[ 2560.588866] BUG: soft lockup - CPU#5 stuck for 22s! [z_wr_iss/5:27866]
[ 2560.598850] BUG: soft lockup - CPU#6 stuck for 22s! [z_wr_iss/6:27867]
[ 2560.618820] BUG: soft lockup - CPU#7 stuck for 22s! [z_wr_iss/7:27868]
[ 2560.628804] BUG: soft lockup - CPU#8 stuck for 22s! [z_wr_iss/8:27869]
[ 2560.638792] BUG: soft lockup - CPU#9 stuck for 22s! [z_wr_iss/9:27870]
[ 2560.658761] BUG: soft lockup - CPU#10 stuck for 22s! [z_wr_iss/10:27871]
[ 2560.668746] BUG: soft lockup - CPU#11 stuck for 22s! [z_wr_iss/11:27872]
[ 2560.678731] BUG: soft lockup - CPU#12 stuck for 22s! [z_wr_iss/12:27873]
[ 2560.698700] BUG: soft lockup - CPU#13 stuck for 22s! [z_wr_iss/13:27874]
[ 2560.708685] BUG: soft lockup - CPU#14 stuck for 22s! [z_wr_iss/14:27875]
[ 2560.718668] BUG: soft lockup - CPU#15 stuck for 22s! [z_wr_iss/15:27876]
at which point i had to reboot to get the machine back
Well that's some progress at least. What kernel version were you using?
That last test was 3.6.4 but the problem exists in 3.4.x and 3.5.x as well. I had earlier switched to 3.6.x in the hope it would help.
OK, I only ask because there was a serious vmalloc() performance regression between 2.6.31-2.6.38 which I suspect is causing some of my slowness in my testing. I'm patching the RHEL kernel now with the upstream fix to see how much it improves things.
torvalds/linux@89699605fe7cfd8611900346f61cb6cbf179b10a
Before your system hangs do you notice the [events/x] threads starting to take a majority of the available CPU?
for reference
NAME STATE READ WRITE CKSUM
tank0 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sdh ONLINE 0 0 0
sdi ONLINE 0 0 0
logs
sdb1 ONLINE 0 0 0
cache
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
log and cache devices don't seem to affect things
ashift=12
sde...sdi are 2TiB AF (4k sector) 'green' devices (ie. slow)
a tiny amount of IO (500MB of small files) ... worked fine, then about 5s later (guessing) the machine wedged up with:
[44412.678135] INFO: rcu_sched self-detected stall on CPU
[44412.678138] INFO: rcu_sched self-detected stall on CPU
[44412.678140] INFO: rcu_sched self-detected stall on CPU
busy processes
28998 root 20 0 0 0 0 D 40 0.0 0:00.69 flush-240:0
660 root 20 0 0 0 0 D 37 0.0 0:01.22 kswapd1
22567 root 39 19 0 0 0 S 19 0.0 0:00.25 zvol/1
22569 root 39 19 0 0 0 D 17 0.0 0:00.23 zvol/3
22573 root 39 19 0 0 0 D 17 0.0 0:00.23 zvol/7
22574 root 39 19 0 0 0 D 17 0.0 0:00.23 zvol/8
22568 root 39 19 0 0 0 D 16 0.0 0:00.22 zvol/2
22570 root 39 19 0 0 0 D 16 0.0 0:00.22 zvol/4
22575 root 39 19 0 0 0 D 16 0.0 0:00.22 zvol/9
22577 root 39 19 0 0 0 D 16 0.0 0:00.23 zvol/11
22571 root 39 19 0 0 0 D 15 0.0 0:00.21 zvol/5
22566 root 39 19 0 0 0 R 14 0.0 0:00.21 zvol/0
22572 root 39 19 0 0 0 R 14 0.0 0:00.19 zvol/6
22576 root 39 19 0 0 0 D 14 0.0 0:00.21 zvol/10
23163 root 20 0 0 0 0 S 13 0.0 0:05.94 kworker/6:4
23801 root 20 0 0 0 0 S 12 0.0 0:03.85 kworker/1:4
27331 root 20 0 0 0 0 S 12 0.0 0:00.23 kworker/4:0
1016 root 20 0 0 0 0 S 11 0.0 0:01.89 kworker/9:1
1076 root 20 0 0 0 0 S 11 0.0 0:02.62 kworker/10:1
22870 root 20 0 0 0 0 D 11 0.0 0:05.53 kworker/7:3
23603 root 20 0 0 0 0 R 11 0.0 0:03.34 kworker/3:5
1099 root 20 0 0 0 0 R 9 0.0 0:01.84 kworker/8:1
1100 root 20 0 0 0 0 S 9 0.0 0:02.40 kworker/11:1
23605 root 20 0 0 0 0 S 9 0.0 0:05.42 kworker/5:4
27869 root 20 0 0 0 0 S 8 0.0 0:00.08 kworker/0:2
wedged processes
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 4 0.0 0.0 0 0 ? D Nov01 0:13 [kworker/0:0]
root 660 0.0 0.0 0 0 ? D Nov01 0:01 [kswapd1]
root 1081 0.0 0.0 0 0 ? D Nov01 0:00 [xfsaild/sda3]
root 22566 0.0 0.0 0 0 ? DN Nov01 0:00 [zvol/0]
root 22568 0.0 0.0 0 0 ? DN Nov01 0:00 [zvol/2]
root 22569 0.0 0.0 0 0 ? DN Nov01 0:00 [zvol/3]
root 22570 0.0 0.0 0 0 ? DN Nov01 0:00 [zvol/4]
root 22571 0.0 0.0 0 0 ? DN Nov01 0:00 [zvol/5]
root 22572 0.0 0.0 0 0 ? DN Nov01 0:00 [zvol/6]
root 22573 0.0 0.0 0 0 ? DN Nov01 0:00 [zvol/7]
root 22574 0.0 0.0 0 0 ? DN Nov01 0:00 [zvol/8]
root 22575 0.0 0.0 0 0 ? DN Nov01 0:00 [zvol/9]
root 22576 0.0 0.0 0 0 ? DN Nov01 0:00 [zvol/10]
root 22577 0.0 0.0 0 0 ? DN Nov01 0:00 [zvol/11]
root 22870 0.0 0.0 0 0 ? D Nov01 0:05 [kworker/7:3]
root 23828 0.0 0.0 0 0 ? DN Nov01 0:02 [z_wr_iss/0]
root 23830 0.0 0.0 0 0 ? DN Nov01 0:02 [z_wr_iss/2]
root 23831 0.0 0.0 0 0 ? DN Nov01 0:02 [z_wr_iss/3]
root 23832 0.0 0.0 0 0 ? DN Nov01 0:02 [z_wr_iss/4]
root 23833 0.0 0.0 0 0 ? DN Nov01 0:02 [z_wr_iss/5]
root 23835 0.0 0.0 0 0 ? DN Nov01 0:03 [z_wr_iss/7]
root 23836 0.0 0.0 0 0 ? DN Nov01 0:03 [z_wr_iss/8]
root 23837 0.0 0.0 0 0 ? DN Nov01 0:03 [z_wr_iss/9]
root 23838 0.0 0.0 0 0 ? DN Nov01 0:02 [z_wr_iss/10]
root 23840 0.0 0.0 0 0 ? DN Nov01 0:03 [z_wr_iss/12]
root 23841 0.0 0.0 0 0 ? DN Nov01 0:03 [z_wr_iss/13]
root 23843 0.0 0.0 0 0 ? DN Nov01 0:03 [z_wr_iss/15]
root 23850 0.0 0.0 0 0 ? DN Nov01 0:00 [z_wr_int/1]
root 23861 0.0 0.0 0 0 ? DN Nov01 0:01 [z_wr_int/12]
root 28998 4.7 0.0 0 0 ? D 10:31 0:00 [flush-240:0]
meminfo
MemTotal: 24751200 kB
MemFree: 167020 kB
Buffers: 3872 kB
Cached: 5918016 kB
SwapCached: 176 kB
Active: 1854856 kB
Inactive: 4093176 kB
Active(anon): 531124 kB
Inactive(anon): 33660 kB
Active(file): 1323732 kB
Inactive(file): 4059516 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 3903788 kB
SwapFree: 3901768 kB
Dirty: 217920 kB
Writeback: 2884 kB
AnonPages: 25276 kB
Mapped: 9852 kB
Shmem: 539296 kB
Slab: 1314196 kB
SReclaimable: 580856 kB
SUnreclaim: 733340 kB
KernelStack: 3784 kB
PageTables: 4816 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 16279388 kB
Committed_AS: 640276 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 10801752 kB
VmallocChunk: 34335371952 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 3648 kB
DirectMap2M: 2084864 kB
DirectMap1G: 23068672 kB
spl slab
--------------------- cache ------------------------------------------------------- ----- slab ------ ---- object ----- --- emergency ---
name flags size alloc slabsize objsize total alloc max total alloc max dlock alloc max
spl_vn_cache 0x00020 0 0 4096 88 0 0 12 0 0 252 0 0 0
spl_vn_file_cache 0x00020 0 0 4096 96 0 0 12 0 0 252 0 0 0
spl_zlib_workspace_cache 0x00140 0 0 8388608 268104 0 0 0 0 0 0 0 0 0
zio_cache 0x00020 13615104 12642560 16384 952 831 830 830 13296 13280 13280 0 0 0
zio_link_cache 0x00020 1261568 663120 4096 48 308 307 307 13860 13815 13815 0 0 0
zio_vdev_cache 0x00040 25165824 17434704 4194304 131088 6 6 9 180 133 256 0 0 0
zio_buf_512 0x00020 1959755776 918073856 32768 512 59807 59807 60219 1854017 1793113 1866789 0 0 0
zio_data_buf_512 0x00020 31555584 15273984 32768 512 963 963 964 29853 29832 29884 0 0 0
zio_buf_1024 0x00040 65536 43008 32768 1024 2 2 788 42 42 16548 0 0 0
zio_data_buf_1024 0x00040 23232512 15173632 32768 1024 709 709 709 14889 14818 14889 0 0 0
zio_buf_1536 0x00040 131072 95232 65536 1536 2 2 195 62 62 6045 0 0 0
zio_data_buf_1536 0x00040 23199744 16723968 65536 1536 354 354 355 10974 10888 11005 0 0 0
zio_buf_2048 0x00040 589824 337920 65536 2048 9 9 129 225 165 3225 0 0 0
zio_data_buf_2048 0x00040 24248320 18700288 65536 2048 370 370 371 9250 9131 9275 0 0 0
zio_buf_2560 0x00040 458752 376320 65536 2560 7 7 98 147 147 2058 0 0 0
zio_data_buf_2560 0x00040 22347776 18142720 65536 2560 341 341 341 7161 7087 7161 0 0 0
zio_buf_3072 0x00040 262144 196608 65536 3072 4 4 84 72 64 1512 0 0 0
zio_data_buf_3072 0x00040 21299200 17774592 65536 3072 325 325 325 5850 5786 5850 0 0 0
zio_buf_3584 0x00040 262144 222208 131072 3584 2 2 66 62 62 2046 0 0 0
zio_data_buf_3584 0x00040 21889024 18464768 131072 3584 167 167 170 5177 5152 5270 0 0 0
zio_buf_4096 0x00040 12058624 4141056 262144 4096 46 40 112 1426 1011 3472 0 0 0
zio_data_buf_4096 0x00040 770703360 373182464 262144 4096 2940 2939 2939 91140 91109 91109 0 0 0
zio_buf_5120 0x00040 524288 430080 131072 5120 4 4 55 84 84 1155 0 0 0
zio_data_buf_5120 0x00040 46923776 38205440 131072 5120 358 358 359 7518 7462 7539 0 0 0
zio_buf_6144 0x00040 1703936 1437696 131072 6144 13 13 46 234 234 828 0 0 0
zio_data_buf_6144 0x00040 43909120 36925440 131072 6144 335 335 338 6030 6010 6084 0 0 0
zio_buf_7168 0x00040 0 0 262144 7168 0 0 23 0 0 713 0 0 0
zio_data_buf_7168 0x00040 43515904 36764672 262144 7168 166 166 169 5146 5129 5239 0 0 0
zio_buf_8192 0x00040 4980736 3268608 262144 8192 19 19 41 399 399 861 0 0 0
zio_data_buf_8192 0x00040 54525952 35627008 262144 8192 208 208 212 4368 4349 4452 0 0 0
zio_buf_10240 0x00040 0 0 262144 10240 0 0 39 0 0 819 0 0 0
zio_data_buf_10240 0x00040 86507520 70635520 262144 10240 330 330 333 6930 6898 6993 0 0 0
zio_buf_12288 0x00040 0 0 524288 12288 0 0 27 0 0 837 0 0 0
zio_data_buf_12288 0x00040 81264640 58294272 524288 12288 155 155 155 4805 4744 4805 0 0 0
zio_buf_14336 0x00040 0 0 524288 14336 0 0 17 0 0 527 0 0 0
zio_data_buf_14336 0x00040 60293120 49846272 524288 14336 115 115 115 3565 3477 3565 0 0 0
zio_buf_16384 0x00040 1598554112 1031274496 524288 16384 3049 3049 3161 76225 62944 79025 0 0 0
zio_data_buf_16384 0x00040 62390272 47955968 524288 16384 119 119 119 2975 2927 2975 0 0 0
zio_buf_20480 0x00040 0 0 524288 20480 0 0 42 0 0 863 0 0 0
zio_data_buf_20480 0x00040 104333312 84254720 524288 20480 199 199 201 4179 4114 4221 0 0 0
zio_buf_24576 0x00040 0 0 524288 24576 0 0 41 0 0 738 0 0 0
zio_data_buf_24576 0x00040 90177536 74907648 524288 24576 172 172 173 3096 3048 3114 0 0 0
zio_buf_28672 0x00040 0 0 1048576 28672 0 0 20 0 0 620 0 0 0
zio_data_buf_28672 0x00040 95420416 77758464 1048576 28672 91 91 94 2821 2712 2914 0 0 0
zio_buf_32768 0x00040 0 0 1048576 32768 0 0 69 0 0 1932 0 0 0
zio_data_buf_32768 0x00040 79691776 65503232 1048576 32768 76 76 77 2128 1999 2136 0 0 0
zio_buf_36864 0x00040 0 0 1048576 36864 0 0 19 0 0 475 0 0 0
zio_data_buf_36864 0x00040 101711872 85708800 1048576 36864 97 97 97 2425 2325 2425 0 0 0
zio_buf_40960 0x00040 0 0 1048576 40960 0 0 21 0 0 483 0 0 0
zio_data_buf_40960 0x00040 126877696 106250240 1048576 40960 121 121 121 2783 2594 2783 0 0 0
zio_buf_45056 0x00040 0 0 1048576 45056 0 0 27 0 0 567 0 0 0
zio_data_buf_45056 0x00040 98566144 81776640 1048576 45056 94 94 94 1974 1815 1974 0 0 0
zio_buf_49152 0x00040 0 0 1048576 49152 0 0 26 0 0 494 0 0 0
zio_data_buf_49152 0x00040 111149056 94027776 1048576 49152 106 106 106 2014 1913 2014 0 0 0
zio_buf_53248 0x00040 0 0 1048576 53248 0 0 23 0 0 414 0 0 0
zio_data_buf_53248 0x00040 72351744 59052032 1048576 53248 69 69 69 1242 1109 1242 0 0 0
zio_buf_57344 0x00040 0 0 1048576 57344 0 0 26 0 0 442 0 0 0
zio_data_buf_57344 0x00040 47185920 42262528 1048576 57344 45 45 45 765 737 765 0 0 0
zio_buf_61440 0x00040 0 0 2097152 61440 0 0 16 0 0 466 0 0 0
zio_data_buf_61440 0x00040 41943040 36311040 2097152 61440 20 20 20 620 591 620 0 0 0
zio_buf_65536 0x00040 0 0 2097152 65536 0 0 14 0 0 420 0 0 0
zio_data_buf_65536 0x00040 35651584 28901376 2097152 65536 17 17 19 510 441 570 0 0 0
zio_buf_69632 0x00040 0 0 2097152 69632 0 0 22 0 0 616 0 0 0
zio_data_buf_69632 0x00040 35651584 31473664 2097152 69632 17 17 17 476 452 476 0 0 0
zio_buf_73728 0x00040 0 0 2097152 73728 0 0 24 0 0 624 0 0 0
zio_data_buf_73728 0x00040 44040192 39297024 2097152 73728 21 21 21 546 533 546 0 0 0
zio_buf_77824 0x00040 0 0 2097152 77824 0 0 27 0 0 675 0 0 0
zio_data_buf_77824 0x00040 54525952 48484352 2097152 77824 26 26 26 650 623 650 0 0 0
zio_buf_81920 0x00040 0 0 2097152 81920 0 0 33 0 0 792 0 0 0
zio_data_buf_81920 0x00040 48234496 44400640 2097152 81920 23 23 23 552 542 552 0 0 0
zio_buf_86016 0x00040 0 0 2097152 86016 0 0 22 0 0 506 0 0 0
zio_data_buf_86016 0x00040 41943040 38019072 2097152 86016 20 20 20 460 442 460 0 0 0
zio_buf_90112 0x00040 0 0 2097152 90112 0 0 18 0 0 396 0 0 0
zio_data_buf_90112 0x00040 37748736 32169984 2097152 90112 18 18 20 396 357 440 0 0 0
zio_buf_94208 0x00040 0 0 2097152 94208 0 0 17 0 0 357 0 0 0
zio_data_buf_94208 0x00040 25165824 21950464 2097152 94208 12 12 12 252 233 252 0 0 0
zio_buf_98304 0x00040 0 0 2097152 98304 0 0 28 0 0 560 0 0 0
zio_data_buf_98304 0x00040 27262976 23887872 2097152 98304 13 13 13 260 243 260 0 0 0
zio_buf_102400 0x00040 0 0 2097152 102400 0 0 25 0 0 475 0 0 0
zio_data_buf_102400 0x00040 25165824 20172800 2097152 102400 12 12 12 228 197 228 0 0 0
zio_buf_106496 0x00040 0 0 2097152 106496 0 0 16 0 0 288 0 0 0
zio_data_buf_106496 0x00040 31457280 24387584 2097152 106496 15 15 15 270 229 270 0 0 0
zio_buf_110592 0x00040 0 0 2097152 110592 0 0 19 0 0 325 0 0 0
zio_data_buf_110592 0x00040 33554432 28311552 2097152 110592 16 16 16 288 256 288 0 0 0
zio_buf_114688 0x00040 0 0 2097152 114688 0 0 19 0 0 311 0 0 0
zio_data_buf_114688 0x00040 25165824 20529152 2097152 114688 12 12 12 204 179 204 0 0 0
zio_buf_118784 0x00040 0 0 2097152 118784 0 0 13 0 0 221 0 0 0
zio_data_buf_118784 0x00040 25165824 20193280 2097152 118784 12 12 12 204 170 204 0 0 0
zio_buf_122880 0x00040 0 0 2097152 122880 0 0 13 0 0 208 0 0 0
zio_data_buf_122880 0x00040 18874368 15605760 2097152 122880 9 9 10 144 127 160 0 0 0
zio_buf_126976 0x00040 0 0 4194304 126976 0 0 11 0 0 341 0 0 0
zio_data_buf_126976 0x00040 20971520 15491072 4194304 126976 5 5 5 155 122 155 0 0 0
zio_buf_131072 0x00040 37748736 9961472 4194304 131072 9 9 47 279 76 1450 0 0 0
zio_data_buf_131072 0x00040 6014631936 4962779136 4194304 131072 1434 1434 1684 44454 37863 52204 0 0 0
sa_cache 0x00020 222527488 143411040 4096 80 54328 54328 54331 1792824 1792638 1792923 0 0 0
spill_cache 0x00040 0 0 4194304 131072 0 0 0 0 0 0 0 0 0
dnode_t 0x00020 1858338816 1684049952 16384 928 113424 113424 113429 1814784 1814709 1814864 0 0 0
dmu_buf_impl_t 0x00020 677863424 574939960 8192 280 82747 82747 83583 2068675 2053357 2089575 0 0 0
arc_buf_hdr_t 0x00020 163201024 140881312 8192 272 19922 19921 19921 517972 517946 517946 0 0 0
arc_buf_t 0x00020 55218176 38631376 4096 112 13481 13481 15871 350506 344923 412646 0 0 0
zil_lwb_cache 0x00020 528384 412800 4096 200 129 129 129 2064 2064 2064 0 0 0
zfs_znode_cache 0x00020 1835827200 1677925080 16384 936 112050 112050 112053 1792800 1792655 1792848 0 0 0
arc stats
name type data
hits 4 10397012
misses 4 483291
demand_data_hits 4 3971244
demand_data_misses 4 77845
demand_metadata_hits 4 3513236
demand_metadata_misses 4 253398
prefetch_data_hits 4 77958
prefetch_data_misses 4 84806
prefetch_metadata_hits 4 2834574
prefetch_metadata_misses 4 67242
mru_hits 4 2385761
mru_ghost_hits 4 92485
mfu_hits 4 5101244
mfu_ghost_hits 4 21572
deleted 4 138
recycle_miss 4 18287
mutex_miss 4 16
evict_skip 4 168
evict_l2_cached 4 3344460288
evict_l2_eligible 4 13664256
evict_l2_ineligible 4 84999168
hash_elements 4 443926
hash_elements_max 4 443926
hash_collisions 4 154100
hash_chains 4 109623
hash_chain_max 4 8
p 4 17759673
c 4 10998040368
c_min 4 2147483648
c_max 4 17179869184
size 4 10998035512
hdr_size 4 180145056
data_size 4 7985608704
other_size 4 2832281752
anon_size 4 307566592
anon_evict_data 4 0
anon_evict_metadata 4 0
mru_size 4 5848043520
mru_evict_data 4 5083303936
mru_evict_metadata 4 0
mru_ghost_size 4 2902842880
mru_ghost_evict_data 4 2750266368
mru_ghost_evict_metadata 4 152576512
mfu_size 4 1829998592
mfu_evict_data 4 1576538112
mfu_evict_metadata 4 0
mfu_ghost_size 4 383837184
mfu_ghost_evict_data 4 20037632
mfu_ghost_evict_metadata 4 363799552
l2_hits 4 111304
l2_misses 4 371924
l2_feeds 4 43424
l2_rw_clash 4 21
l2_read_bytes 4 282380288
l2_write_bytes 4 5715663872
l2_writes_sent 4 1393
l2_writes_done 4 1393
l2_writes_error 4 0
l2_writes_hdr_miss 4 64
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_free_on_write 4 84
l2_abort_lowmem 4 6
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 5660558848
l2_hdr_size 4 0
memory_throttle_count 4 0
memory_direct_count 4 126
memory_indirect_count 4 4057
arc_no_grow 4 0
arc_tempreserve 4 2850816
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 4037600312
arc_meta_limit 4 4294967296
arc_meta_max 4 4297072952
with the latest 'stats spl i get further but still things explode
[ 2899.886775] BUG: soft lockup - CPU#0 stuck for 22s! [z_wr_iss/0:24014]
[ 2899.893659] Modules linked in: zfs(O)[ 2899.896756] BUG: soft lockup - CPU#1 stuck for 22s! [z_wr_iss/1:24015]
[ 2899.896757] Modules linked in: zfs(O) zunicode(O) zavl(O) zcommon(O) znvpair(O) spl(O) loop tcm_loop iscsi_target_mod target_core_mod configfs sg coretemp kvm_intel kvm mpt2sas i2c_i801 ghash_clmulni_intel i2c_core iTCO_wdt iTCO_vendor_support lpc_ich mfd_core microcode i7core_edac [last unloaded: spl]
[ 2899.896775] CPU 1 [ 2899.896777] Pid: 24015, comm: z_wr_iss/1 Tainted: G O 3.6.5.cw1 #25 Supermicro X8DTL/X8DTL
[ 2899.896778] RIP: 0010:[<ffffffffa01670de>] [<ffffffffa01670de>] spl_kmem_cache_alloc+0x191/0xb97 [spl]
[ 2899.896790] RSP: 0018:ffff880316a07bb0 EFLAGS: 00000246
[ 2899.896791] RAX: 0000000000000000 RBX: ffffffff814fa21d RCX: ffff88032a603620
[ 2899.896792] RDX: ffff88032a6035e8 RSI: 0000000000000286 RDI: ffff88032a603610
[ 2899.896793] RBP: ffff880316a07c80 R08: ffff880316a06000 R09: 0000000000000000
[ 2899.896794] R10: 0140000000000000 R11: ffff880316a07c28 R12: ffff880316a07b28
[ 2899.896795] R13: 0000000000000046 R14: ffff880316a06000 R15: 0000000000000000
[ 2899.896796] FS: 0000000000000000(0000) GS:ffff880333c20000(0000) knlGS:0000000000000000
[ 2899.896798] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2899.896799] CR2: 00007f3a0a97ce02 CR3: 0000000001775000 CR4: 00000000000007e0
[ 2899.896800] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2899.896801] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2899.896802] Process z_wr_iss/1 (pid: 24015, threadinfo ffff880316a06000, task ffff8803166e2d40)
[ 2899.896803] Stack:
[ 2899.896803] ffff880316a07c00 0000000000000000 0000000000000202 ffff88032a605960
[ 2899.896807] 0000003000000000 ffff880216249f40 ffff8803166e2d40 ffff88032a6035f8
[ 2899.896811] ffff880316a07c40 0000000016249f40 0000000000000212 ffff88032a603560
[ 2899.896814] Call Trace:
[ 2899.896816] [<ffffffff81059f93>] ? abort_exclusive_wait+0x8f/0x8f
[ 2899.896821] [<ffffffffa035628d>] zio_buf_alloc+0x22/0x24 [zfs]
[ 2899.896864] [<ffffffffa0358dc2>] zio_vdev_io_start+0xda/0x223 [zfs]
[ 2899.896894] [<ffffffffa0357c8e>] zio_nowait+0xdc/0x103 [zfs]
[ 2899.896923] [<ffffffffa0329647>] vdev_mirror_io_start+0x30a/0x327 [zfs]
[ 2899.896947] [<ffffffffa0328fc4>] ? vdev_config_sync+0x134/0x134 [zfs]
[ 2899.896972] [<ffffffffa031dcc5>] ? spa_config_enter+0xc4/0xde [zfs]
[ 2899.896996] [<ffffffffa0358d38>] zio_vdev_io_start+0x50/0x223 [zfs]
[ 2899.897026] [<ffffffffa0355d7d>] zio_execute+0xb8/0xe0 [zfs]
[ 2899.897056] [<ffffffffa0169587>] taskq_thread+0x2aa/0x4d2 [spl]
[ 2899.897062] [<ffffffff8106394b>] ? finish_task_switch+0x48/0xad
[ 2899.897066] [<ffffffff81067b74>] ? try_to_wake_up+0x1d4/0x1d4
[ 2899.897068] [<ffffffffa01692dd>] ? task_done+0x110/0x110 [spl]
[ 2899.897074] [<ffffffff81059601>] kthread+0x6f/0x77
[ 2899.897076] [<ffffffff81501904>] kernel_thread_helper+0x4/0x10
[ 2899.897081] [<ffffffff814fa21d>] ? retint_restore_args+0x13/0x13
[ 2899.897086] [<ffffffff81059592>] ? kthread_freezable_should_stop+0x43/0x43
[ 2899.897088] [<ffffffff81501900>] ? gs_change+0x13/0x13
[ 2899.906742] BUG: soft lockup - CPU#2 stuck for 22s! [z_wr_iss/2:24016]
[ 2899.906742] Modules linked in: zfs(O) zunicode(O) zavl(O) zcommon(O) znvpair(O) spl(O) loop tcm_loop iscsi_target_mod target_core_mod configfs sg coretemp kvm_intel kvm mpt2sas i2c_i801 ghash_clmulni_intel i2c_core iTCO_wdt iTCO_vendor_support lpc_ich mfd_core microcode i7core_edac [last unloaded: spl]
[ 2899.906757] CPU 2 [ 2899.906759] Pid: 24016, comm: z_wr_iss/2 Tainted: G O 3.6.5.cw1 #25 Supermicro X8DTL/X8DTL
[ 2899.906760] RIP: 0010:[<ffffffffa01670de>] [<ffffffffa01670de>] spl_kmem_cache_alloc+0x191/0xb97 [spl]
[ 2899.906767] RSP: 0018:ffff880317aa1ae0 EFLAGS: 00000246
[ 2899.906768] RAX: 0000000000000000 RBX: ffffffff814fa21d RCX: ffff88032a603620
[ 2899.906769] RDX: ffff88032a6035e8 RSI: 0000000000000282 RDI: ffff88032a603610
[ 2899.906770] RBP: ffff880317aa1bb0 R08: ffff880317aa0000 R09: 0000000000000000
[ 2899.906772] R10: 0140000000000000 R11: ffff880317aa1b58 R12: ffff880317aa1a58
[ 2899.906773] R13: 0000000000000046 R14: ffff880317aa0000 R15: 0000000000000000
[ 2899.906774] FS: 0000000000000000(0000) GS:ffff880333c40000(0000) knlGS:0000000000000000
[ 2899.906775] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2899.906776] CR2: 00000000006edbd4 CR3: 0000000001775000 CR4: 00000000000007e0
[ 2899.906777] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2899.906778] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2899.906780] Process z_wr_iss/2 (pid: 24016, threadinfo ffff880317aa0000, task ffff8803166e43e0)
[ 2899.906780] Stack:
[ 2899.906781] 00000000000001cc 0000000000000000 ffff880317aa1b20 ffffffffa0164d0b
[ 2899.906784] ffff880317aa1b50 00000030a0164d0b ffff8803166e43e0 ffff88032a6035f8
[ 2899.906788] ffff880317aa1b70 00000000a0164e8a 0000000000000206 ffff88032a603560
[ 2899.906791] Call Trace:
[ 2899.906792] [<ffffffffa0164d0b>] ? kmalloc_nofail+0x2c/0x3e [spl]
[ 2899.906798] [<ffffffff81059f93>] ? abort_exclusive_wait+0x8f/0x8f
[ 2899.906801] [<ffffffffa035628d>] zio_buf_alloc+0x22/0x24 [zfs]
[ 2899.906833] [<ffffffffa032b978>] vdev_raidz_io_start+0x252/0x538 [zfs]
[ 2899.906859] [<ffffffffa0358ef9>] zio_vdev_io_start+0x211/0x223 [zfs]
[ 2899.906888] [<ffffffffa0357c8e>] zio_nowait+0xdc/0x103 [zfs]
[ 2899.906917] [<ffffffffa0329647>] vdev_mirror_io_start+0x30a/0x327 [zfs]
[ 2899.906942] [<ffffffffa0328fc4>] ? vdev_config_sync+0x134/0x134 [zfs]
[ 2899.906966] [<ffffffffa031dcc5>] ? spa_config_enter+0xc4/0xde [zfs]
[ 2899.906991] [<ffffffffa0358d38>] zio_vdev_io_start+0x50/0x223 [zfs]
[ 2899.907020] [<ffffffffa0355d7d>] zio_execute+0xb8/0xe0 [zfs]
[ 2899.907050] [<ffffffffa0169587>] taskq_thread+0x2aa/0x4d2 [spl]
[ 2899.907056] [<ffffffff81063985>] ? finish_task_switch+0x82/0xad
[ 2899.907059] [<ffffffff81067b74>] ? try_to_wake_up+0x1d4/0x1d4
[ 2899.907061] [<ffffffffa01692dd>] ? task_done+0x110/0x110 [spl]
[ 2899.907067] [<ffffffff81059601>] kthread+0x6f/0x77
[ 2899.907069] [<ffffffff81501904>] kernel_thread_helper+0x4/0x10
[ 2899.907072] [<ffffffff814fa21d>] ? retint_restore_args+0x13/0x13
[ 2899.907076] [<ffffffff81059592>] ? kthread_freezable_should_stop+0x43/0x43
[ 2899.907078] [<ffffffff81501900>] ? gs_change+0x13/0x13
recent slab details (1s apart)
zio_buf_4096 0x00040 419430400 57413632 262144 4096 1600 598 3366 49600 14017 104346 0 0 0
zio_data_buf_4096 0x00040 7402684416 3272089600 262144 4096 28239 28077 44967 875409 798850 1393977 0 0 0
zio_buf_4096 0x00040 419430400 15994880 262144 4096 1600 152 3366 49600 3905 104346 0 0 0
zio_data_buf_4096 0x00040 7402684416 3061850112 262144 4096 28239 28017 44967 875409 747522 1393977 0 0 0
zio_buf_4096 0x00040 419430400 129372160 262144 4096 1600 1333 3366 49600 31585 104346 0 0 0
zio_data_buf_4096 0x00040 7402684416 3090554880 262144 4096 28239 28001 44967 875409 754530 1393977 0 0 0
zio_buf_4096 0x00040 419430400 3411968 262144 4096 1600 44 3366 49600 833 104346 0 0 0
zio_data_buf_4096 0x00040 7402684416 3036553216 262144 4096 28239 27999 44967 875409 741346 1393977 0 0 0
zio_buf_4096 0x00040 419430400 3149824 262144 4096 1600 44 3366 49600 769 104346 0 0 0
zio_data_buf_4096 0x00040 7402684416 3198296064 262144 4096 28239 27987 44967 875409 780834 1393977 0 0 0
zio_buf_4096 0x00040 395313152 7737344 262144 4096 1508 360 3366 46748 1889 104346 0 0 0
zio_data_buf_4096 0x00040 7364935680 3056738304 262144 4096 28095 27931 44967 870945 746274 1393977 0 0 0
zio_buf_4096 0x00040 395313152 6295552 262144 4096 1508 142 3366 46748 1537 104346 0 0 0
zio_data_buf_4096 0x00040 7364935680 2998018048 262144 4096 28095 27865 44967 870945 731938 1393977 0 0 0
zio_buf_4096 0x00040 395313152 4853760 262144 4096 1508 154 3366 46748 1185 104346 0 0 0
zio_data_buf_4096 0x00040 7364935680 3104579584 262144 4096 28095 27865 44967 870945 757954 1393977 0 0 0
zio_buf_4096 0x00040 395313152 2625536 262144 4096 1508 119 3366 46748 641 104346 0 0 0
zio_data_buf_4096 0x00040 7364935680 3196985344 262144 4096 28095 27865 44967 870945 780514 1393977 0 0 0
zio_buf_4096 0x00040 395313152 3411968 262144 4096 1508 139 3366 46748 833 104346 0 0 0
zio_data_buf_4096 0x00040 7364935680 3319275520 262144 4096 28095 27865 44967 870945 810370 1393977 0 0 0
zio_buf_4096 0x00040 395313152 2756608 262144 4096 1508 143 3366 46748 673 104346 0 0 0
zio_data_buf_4096 0x00040 7364935680 3342082048 262144 4096 28095 27865 44967 870945 815938 1393977 0 0 0
zio_buf_4096 0x00040 395313152 2887680 262144 4096 1508 104 3366 46748 705 104346 0 0 0
zio_data_buf_4096 0x00040 7364935680 3359514624 262144 4096 28095 27865 44967 870945 820194 1393977 0 0 0
zio_buf_4096 0x00040 315621376 57151488 262144 4096 1204 451 3366 37324 13953 104346 0 0 0
zio_data_buf_4096 0x00040 7431782400 3599507456 262144 4096 28350 28350 44967 878850 878786 1393977 0 0 0
zio_buf_4096 0x00040 315621376 12062720 262144 4096 1204 95 3366 37324 2945 104346 0 0 0
zio_data_buf_4096 0x00040 7464812544 3294253056 262144 4096 28476 28337 44967 882756 804261 1393977 0 0 0
zio_buf_4096 0x00040 315621376 25956352 262144 4096 1204 205 3366 37324 6337 104346 0 0 0
zio_data_buf_4096 0x00040 7464812544 3328724992 262144 4096 28476 28333 44967 882756 812677 1393977 0 0 0
zio_buf_4096 0x00040 315621376 5246976 262144 4096 1204 44 3366 37324 1281 104346 0 0 0
zio_data_buf_4096 0x00040 7464812544 3147714560 262144 4096 28476 28144 44967 882756 768485 1393977 0 0 0
zio_buf_4096 0x00040 277872640 2887680 262144 4096 1060 122 3366 32860 705 104346 0 0 0
zio_data_buf_4096 0x00040 7422869504 3233042432 262144 4096 28316 28142 44967 877796 789317 1393977 0 0 0
@cwedgwood I was able to reproduce a failure in my Fedora 17 VM with the 3.6.3-1 debug kernel. While different from your previous debug messages it would explain the hard lockup your seeing. The following GPF when hit multiple times in the kernel context would absolutely cause real hardware to hang as you've described.
I suspect the GPF may be caused by the condition variable in the range lock being destroyed and the memory being freed before the various waiters can be notified. We had similar issue with this exact lock some time ago and I thought I resolved them but perhaps there's still a tiny race. This is actually an upstream bug it just happens that the Solaris condition variable implementation isn't particularly sensitive to this kind of misuse typically because the condition variable is embedded in a fairly long lived structure.
No patch yet, but I'll chew on it now that I've found a concrete issue.
general protection fault: 0000 [#1] SMP CPU 6 Pid: 1388, comm: zvol/14 Tainted: G O 3.6.3-1.fc17.x86_64.debug #1 Bochs Bochs RIP: 0010:[] [ ] __wake_up_common+0x2b/0x90 RSP: 0018:ffff880051809ab0 EFLAGS: 00010082 RAX: 0000000000000286 RBX: ffff88004c423298 RCX: 0000000000000000 RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000003 RDI: ffff88004c423298 RBP: ffff880051809af0 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88004c4232e0 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff88007d200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f9e38112000 CR3: 000000001c077000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process zvol/14 (pid: 1388, threadinfo ffff880051808000, task ffff880051800000) Stack: 0000000151809af0 0000000000000000 ffffffff810a1172 ffff88004c423298 0000000000000286 0000000000000003 0000000000000001 0000000000000000 ffff880051809b30 ffffffff810a1188 ffff880051a8a5b0 ffff88004c423238 Call Trace: [ ] ? __wake_up+0x32/0x70 [ ] __wake_up+0x48/0x70 [ ] cv_wait_common+0x1b8/0x3d0 [spl] [ ] ? wake_up_bit+0x40/0x40 [ ] __cv_wait+0x13/0x20 [spl] [ ] zfs_range_lock+0x4d6/0x620 [zfs] [ ] zvol_get_data+0x89/0x150 [zfs] [ ] zil_commit+0x5a2/0x770 [zfs] [ ] zvol_write+0x1b2/0x480 [zfs] [ ] taskq_thread+0x250/0x820 [spl] [ ] ? finish_task_switch+0x3f/0x120 [ ] ? try_to_wake_up+0x340/0x340 [ ] ? __taskq_create+0x6e0/0x6e0 [spl] [ ] kthread+0xb7/0xc0 [ ] kernel_thread_helper+0x4/0x10 [ ] ? retint_restore_args+0x13/0x13 [ ] ? __init_kthread_worker+0x70/0x70 [ ] ? gs_change+0x13/0x13 Code: 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 18 e8 2a 3c 64 00 89 55 c4 48 8b 57 48 4c 8d 67 48 41 89 f7 41 89 ce 4c 89 45 c8 <4c> 8b 2a 48 8d 42 e8 49 83 ed 18 49 39 d4 RIP [ ] __wake_up_common+0x2b/0x90 RSP general protection fault: 0000 [#1] SMP CPU 6 Pid: 1388, comm: zvol/14 Tainted: G O 3.6.3-1.fc17.x86_64.debug #1 Bochs Bochs RIP: 0010:[ ] [ ] __wake_up_common+0x2b/0x90 RSP: 0018:ffff880051809ab0 EFLAGS: 00010082 RAX: 0000000000000286 RBX: ffff88004c423298 RCX: 0000000000000000 RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000003 RDI: ffff88004c423298 RBP: ffff880051809af0 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88004c4232e0 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff88007d200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f9e38112000 CR3: 000000001c077000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process zvol/14 (pid: 1388, threadinfo ffff880051808000, task ffff880051800000) Stack: 0000000151809af0 0000000000000000 ffffffff810a1172 ffff88004c423298 0000000000000286 0000000000000003 0000000000000001 0000000000000000 ffff880051809b30 ffffffff810a1188 ffff880051a8a5b0 ffff88004c423238 Call Trace: [ ] ? __wake_up+0x32/0x70 [ ] __wake_up+0x48/0x70 [ ] cv_wait_common+0x1b8/0x3d0 [spl] [ ] ? wake_up_bit+0x40/0x40 [ ] __cv_wait+0x13/0x20 [spl] [ ] zfs_range_lock+0x4d6/0x620 [zfs] [ ] zvol_get_data+0x89/0x150 [zfs] [ ] zil_commit+0x5a2/0x770 [zfs] [ ] zvol_write+0x1b2/0x480 [zfs] [ ] taskq_thread+0x250/0x820 [spl] [ ] ? finish_task_switch+0x3f/0x120 [ ] ? try_to_wake_up+0x340/0x340 [ ] ? __taskq_create+0x6e0/0x6e0 [spl] [ ] kthread+0xb7/0xc0 [ ] kernel_thread_helper+0x4/0x10 [ ] ? retint_restore_args+0x13/0x13 [ ] ? __init_kthread_worker+0x70/0x70 [ ] ? gs_change+0x13/0x13 Code: 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 18 e8 2a 3c 64 00 89 55 c4 48 8b 57 48 4c 8d 67 48 41 89 f7 41 89 ce 4c 89 45 c8 <4c> 8b 2a 48 8d 42 e8 49 83 ed 18 49 39 d4 RIP [ ] __wake_up_common+0x2b/0x90 RSP BUG: scheduling while atomic: zvol/14/1388/0x10000002 INFO: lockdep is turned off. Modules linked in: xfs zfs(O) zcommon(O) zunicode(O) znvpair(O) zavl(O) splat(O) spl(O) zlib_deflate lockd sunrpc bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_conntrack Pid: 1388, comm: zvol/14 Tainted: G D O 3.6.3-1.fc17.x86_64.debug #1 Call Trace: [ ] __schedule_bug+0x67/0x75 [ ] __schedule+0x98b/0x9f0 [ ] __cond_resched+0x2a/0x40 [ ] _cond_resched+0x30/0x40 [ ] mutex_lock_nested+0x33/0x390 [ ] ? exit_fs+0x47/0xa0 [ ] perf_event_exit_task+0x30/0x220 [ ] do_exit+0x1d5/0xb00 [ ] ? kmsg_dump+0x1b8/0x240 [ ] ? kmsg_dump+0x25/0x240 [ ] oops_end+0x9d/0xe0 [ ] die+0x58/0x90 [ ] do_general_protection+0x162/0x170 [ ] ? restore_args+0x30/0x30 [ ] general_protection+0x25/0x30 [ ] ? __wake_up_common+0x2b/0x90 [ ] ? __wake_up+0x32/0x70 [ ] __wake_up+0x48/0x70 [ ] cv_wait_common+0x1b8/0x3d0 [spl] [ ] ? wake_up_bit+0x40/0x40 [ ] __cv_wait+0x13/0x20 [spl] [ ] zfs_range_lock+0x4d6/0x620 [zfs] [ ] zvol_get_data+0x89/0x150 [zfs] [ ] zil_commit+0x5a2/0x770 [zfs] [ ] zvol_write+0x1b2/0x480 [zfs] [ ] taskq_thread+0x250/0x820 [spl] [ ] ? finish_task_switch+0x3f/0x120 [ ] ? try_to_wake_up+0x340/0x340 [ ] ? __taskq_create+0x6e0/0x6e0 [spl] [ ] kthread+0xb7/0xc0 [ ] kernel_thread_helper+0x4/0x10 [ ] ? retint_restore_args+0x13/0x13 [ ] ? __init_kthread_worker+0x70/0x70 [ ] ? gs_change+0x13/0x13
@cwedgwood Another possibility occurred to me which I think fits all the facts. I'm still going to verify the CVs are 100% solid, but the crash might also be the result of a stack overflow.
In the Linux kernel we only have 8k of stack space which isn't a lot of elbow room. Under Solaris/FreeBSD the default in 24k and ZFS was originally written with that limit in mind. Now I've gone through a lot of effort to bring the ZFS stack usage down, and for all the workloads I'm aware of stack overruns never occur. The only remaining overrun I'm positive about occurs when running ZFS over multipath devices (#675). The biggest consumer is usually the stack needed to recursively traverse an objects block pointers.
However, your zvol+xfs workload may just push things over the edge too. Running xfs on a zvol is probably close to the worst case, you have a single object and your allocating small blocks all over the entire virtual device. Add in to that the usual xfs stack overhead (which historically has been substantial) and you may trash the stack.
We can test if this is the problem by increasing the default linux stack size to 16k. This can be done safely by recompiling the kernel with THREAD_SIZE_ORDER=2
. See the top of arch/x86/include/asm/page_64_types.h
for x86_64 machines, the THREAD_SIZE_ORDER just needs to be changes from 1 to 2. The only side effect of this change is a little more memory usage and it could slow down your fork()/exec() rate.
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64
index 320f7bb..8fcc6cb 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -1,7 +1,7 @@
#ifndef _ASM_X86_PAGE_64_DEFS_H
#define _ASM_X86_PAGE_64_DEFS_H
-#define THREAD_SIZE_ORDER 1
+#define THREAD_SIZE_ORDER 2
#define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER)
#define CURRENT_MASK (~(THREAD_SIZE - 1))
all recent 'stats stuff, 16K kernel stacks, wedged hard
[ 3482.972444] BUG: soft lockup - CPU#0 stuck for 22s! [z_wr_iss/0:5096]
...
[ 3482.982490] Call Trace:
[ 3482.982492] [<ffffffffa026821e>] ? zio_create+0x31f/0x331 [zfs]
[ 3482.982543] [<ffffffffa026728d>] zio_buf_alloc+0x22/0x24 [zfs]
[ 3482.982572] [<ffffffffa0269dc2>] zio_vdev_io_start+0xda/0x223 [zfs]
[ 3482.982601] [<ffffffffa0268c8e>] zio_nowait+0xdc/0x103 [zfs]
[ 3482.982631] [<ffffffffa023a647>] vdev_mirror_io_start+0x30a/0x327 [zfs]
[ 3482.982655] [<ffffffffa0239fc4>] ? vdev_config_sync+0x134/0x134 [zfs]
[ 3482.982679] [<ffffffffa022ecc5>] ? spa_config_enter+0xc4/0xde [zfs]
[ 3482.982703] [<ffffffffa0269d38>] zio_vdev_io_start+0x50/0x223 [zfs]
[ 3482.982732] [<ffffffffa0266d7d>] zio_execute+0xb8/0xe0 [zfs]
[ 3482.982762] [<ffffffffa007e587>] taskq_thread+0x2aa/0x4d2 [spl]
[ 3482.982768] [<ffffffff81063985>] ? finish_task_switch+0x82/0xad
[ 3482.982773] [<ffffffff81067b74>] ? try_to_wake_up+0x1d4/0x1d4
[ 3482.982776] [<ffffffffa007e2dd>] ? task_done+0x110/0x110 [spl]
[ 3482.982782] [<ffffffff81059601>] kthread+0x6f/0x77
[ 3482.982784] [<ffffffff81501984>] kernel_thread_helper+0x4/0x10
[ 3482.982789] [<ffffffff814fa29d>] ? retint_restore_args+0x13/0x13
[ 3482.982794] [<ffffffff81059592>] ? kthread_freezable_should_stop+0x43/0x43
[ 3482.982796] [<ffffffff81501980>] ? gs_change+0x13/0x13
[ 3482.992428] CPU 2 [ 3482.992430] Pid: 5098, comm: z_wr_iss/2 Tainted: G O 3.6.6.cw1 #2 Supermicro X8DTL/X8DTL
...
[ 3482.992462] Call Trace:
[ 3482.992464] [<ffffffff81059f93>] ? abort_exclusive_wait+0x8f/0x8f
[ 3482.992468] [<ffffffffa026728d>] zio_buf_alloc+0x22/0x24 [zfs]
[ 3482.992500] [<ffffffffa0269dc2>] zio_vdev_io_start+0xda/0x223 [zfs]
[ 3482.992529] [<ffffffffa0268c8e>] zio_nowait+0xdc/0x103 [zfs]
[ 3482.992559] [<ffffffffa023a647>] vdev_mirror_io_start+0x30a/0x327 [zfs]
[ 3482.992583] [<ffffffffa0239fc4>] ? vdev_config_sync+0x134/0x134 [zfs]
[ 3482.992606] [<ffffffffa022ecc5>] ? spa_config_enter+0xc4/0xde [zfs]
[ 3482.992631] [<ffffffffa0269d38>] zio_vdev_io_start+0x50/0x223 [zfs]
[ 3482.992660] [<ffffffffa0266d7d>] zio_execute+0xb8/0xe0 [zfs]
[ 3482.992689] [<ffffffffa007e587>] taskq_thread+0x2aa/0x4d2 [spl]
[ 3482.992694] [<ffffffff8106394b>] ? finish_task_switch+0x48/0xad
[ 3482.992697] [<ffffffff81067b74>] ? try_to_wake_up+0x1d4/0x1d4
[ 3482.992700] [<ffffffffa007e2dd>] ? task_done+0x110/0x110 [spl]
[ 3482.992705] [<ffffffff81059601>] kthread+0x6f/0x77
[ 3482.992708] [<ffffffff81501984>] kernel_thread_helper+0x4/0x10
[ 3482.992712] [<ffffffff814fa29d>] ? retint_restore_args+0x13/0x13
[ 3482.992715] [<ffffffff81059592>] ? kthread_freezable_should_stop+0x43/0x43
[ 3482.992717] [<ffffffff81501980>] ? gs_change+0x13/0x13
[ 3483.012398] CPU 3 [ 3483.012400] Pid: 5099, comm: z_wr_iss/3 Tainted: G O 3.6.6.cw1 #2 Supermicro X8DTL/X8DTL
...
[ 3483.012433] Call Trace:
[ 3483.012434] [<ffffffffa0268c8e>] ? zio_nowait+0xdc/0x103 [zfs]
[ 3483.012465] [<ffffffff81059f93>] ? abort_exclusive_wait+0x8f/0x8f
[ 3483.012469] [<ffffffffa026728d>] zio_buf_alloc+0x22/0x24 [zfs]
[ 3483.012498] [<ffffffffa0267476>] zio_write_bp_init+0x19b/0x3d9 [zfs]
[ 3483.012527] [<ffffffffa0266d7d>] zio_execute+0xb8/0xe0 [zfs]
[ 3483.012557] [<ffffffffa007e587>] taskq_thread+0x2aa/0x4d2 [spl]
[ 3483.012563] [<ffffffff81063985>] ? finish_task_switch+0x82/0xad
[ 3483.012566] [<ffffffff81067b74>] ? try_to_wake_up+0x1d4/0x1d4
[ 3483.012568] [<ffffffffa007e2dd>] ? task_done+0x110/0x110 [spl]
[ 3483.012574] [<ffffffff81059601>] kthread+0x6f/0x77
[ 3483.012576] [<ffffffff81501984>] kernel_thread_helper+0x4/0x10
[ 3483.012579] [<ffffffff814fa29d>] ? retint_restore_args+0x13/0x13
[ 3483.012582] [<ffffffff81059592>] ? kthread_freezable_should_stop+0x43/0x43
[ 3483.012585] [<ffffffff81501980>] ? gs_change+0x13/0x13
...
...
process use when it died
5540 root 20 0 16988 1100 736 D 80 0.0 0:03.27 cp
660 root 20 0 0 0 0 S 47 0.0 3:29.34 kswapd1
659 root 20 0 0 0 0 S 33 0.0 4:50.87 kswapd0
5100 root 39 19 0 0 0 R 14 0.0 0:43.89 z_wr_iss/4
5099 root 39 19 0 0 0 R 13 0.0 0:40.80 z_wr_iss/3
5103 root 39 19 0 0 0 R 13 0.0 0:45.24 z_wr_iss/7
5296 root 20 0 0 0 0 D 13 0.0 0:20.85 flush-240:0
5101 root 39 19 0 0 0 R 12 0.0 0:46.35 z_wr_iss/5
5105 root 39 19 0 0 0 R 11 0.0 0:42.47 z_wr_iss/9
5097 root 39 19 0 0 0 R 11 0.0 0:40.20 z_wr_iss/1
5106 root 39 19 0 0 0 R 11 0.0 0:41.30 z_wr_iss/10
4510 root 0 -20 0 0 0 D 10 0.0 0:12.95 arc_adapt
5104 root 39 19 0 0 0 R 10 0.0 0:41.82 z_wr_iss/8
5096 root 39 19 0 0 0 R 9 0.0 0:34.80 z_wr_iss/0
5168 root 0 -20 0 0 0 R 9 0.0 1:48.36 txg_sync
5111 root 39 19 0 0 0 R 9 0.0 0:48.71 z_wr_iss/15
5102 root 39 19 0 0 0 R 8 0.0 0:44.47 z_wr_iss/6
5527 root 20 0 0 0 0 S 8 0.0 0:00.24 kworker/6:2
5109 root 39 19 0 0 0 R 8 0.0 0:48.54 z_wr_iss/13
4517 root 39 19 0 0 0 R 7 0.0 0:21.80 zvol/5
4523 root 39 19 0 0 0 D 7 0.0 0:22.76 zvol/11
5110 root 39 19 0 0 0 R 7 0.0 0:49.87 z_wr_iss/14
4521 root 39 19 0 0 0 D 7 0.0 0:23.62 zvol/9
4514 root 39 19 0 0 0 D 6 0.0 0:21.76 zvol/2
4519 root 39 19 0 0 0 R 6 0.0 0:22.07 zvol/7
5108 root 39 19 0 0 0 R 6 0.0 0:47.55 z_wr_iss/12
5119 root 39 19 0 0 0 R 6 0.0 0:23.89 z_wr_int/2
5121 root 39 19 0 0 0 R 6 0.0 0:24.50 z_wr_int/4
5124 root 39 19 0 0 0 R 6 0.0 0:25.13 z_wr_int/7
5128 root 39 19 0 0 0 S 6 0.0 0:26.45 z_wr_int/11
5132 root 39 19 0 0 0 S 6 0.0 0:28.03 z_wr_int/15
4512 root 39 19 0 0 0 D 6 0.0 0:19.92 zvol/0
4520 root 39 19 0 0 0 R 6 0.0 0:24.60 zvol/8
5107 root 39 19 0 0 0 R 6 0.0 0:42.40 z_wr_iss/11
5098 root 39 19 0 0 0 R 5 0.0 0:40.29 z_wr_iss/2
5130 root 39 19 0 0 0 R 5 0.0 0:28.25 z_wr_int/13
4515 root 39 19 0 0 0 R 5 0.0 0:21.89 zvol/3
4518 root 39 19 0 0 0 R 5 0.0 0:22.06 zvol/6
4522 root 39 19 0 0 0 D 5 0.0 0:23.19 zvol/10
5129 root 39 19 0 0 0 S 5 0.0 0:28.59 z_wr_int/12
5117 root 39 19 0 0 0 R 4 0.0 0:21.84 z_wr_int/0
5122 root 39 19 0 0 0 R 4 0.0 0:25.51 z_wr_int/5
5126 root 39 19 0 0 0 R 4 0.0 0:25.81 z_wr_int/9
5127 root 39 19 0 0 0 S 4 0.0 0:26.74 z_wr_int/10
5118 root 39 19 0 0 0 R 4 0.0 0:23.82 z_wr_int/1
5125 root 39 19 0 0 0 R 4 0.0 0:27.34 z_wr_int/8
4513 root 39 19 0 0 0 R 3 0.0 0:23.42 zvol/1
1100 root 20 0 0 0 0 S 3 0.0 0:12.34 kworker/11:1
4516 root 39 19 0 0 0 R 3 0.0 0:22.11 zvol/4
5120 root 39 19 0 0 0 R 3 0.0 0:23.90 z_wr_int/3
5123 root 39 19 0 0 0 R 3 0.0 0:25.13 z_wr_int/6
1063 root 20 0 0 0 0 R 3 0.0 0:15.97 kworker/2:1
5131 root 39 19 0 0 0 R 3 0.0 0:28.39 z_wr_int/14
5396 root 20 0 0 0 0 S 3 0.0 0:08.12 kworker/9:8
5415 root 20 0 0 0 0 S 3 0.0 0:10.79 kworker/3:11
5536 root 20 0 0 0 0 R 3 0.0 0:01.82 kworker/5:8
The kmem cache improvements which prevent the system from thrashing and hanging have been merged in to master. For those impacted by this issue if you could test the latest code it would be appreciated.
create a zvol, dd data into it and fairly quickly we get:
[ 622.083345] INFO: task flush-230:0:4232 blocked for more than 20 seconds. [ 622.090558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 622.098746] flush-230:0 D ffff8803318503a0 4224 4232 2 0x00000000 [ 622.106315] ffff88060bda37d0 0000000000000046 ffff880331850000 ffff88060bda3fd8 [ 622.114453] ffff88060bda3fd8 0000000000012780 ffffffff817633f0 ffff880331850000 [ 622.122625] ffff88060bda37a0 ffff880333c12780 ffff880331850000 ffff8802c88aec00 [ 622.136105] Call Trace: [ 622.138728] [] schedule+0x65/0x67
[ 622.144026] [] io_schedule+0x60/0x7a
[ 622.149601] [] get_request_wait+0xbd/0x166
[ 622.155723] [] ? cfq_merge+0x72/0xa1
[ 622.161295] [] ? abort_exclusive_wait+0x8f/0x8f
[ 622.167828] [] blk_queue_bio+0x193/0x2d6
[ 622.173722] [] generic_make_request+0x9c/0xdd
[ 622.179999] [] submit_bio+0xbb/0xd4
[ 622.185405] [] ? inc_zone_page_state+0x27/0x29
[ 622.191903] [] submit_bh+0xf6/0x116
[ 622.197283] [] block_write_full_page+0x200/0x2fd
[ 622.204185] [] ? blkdev_get_blocks+0x93/0x93
[ 622.210460] [] ? drop_buffers+0x96/0x96
[ 622.216355] [] ? blkdev_get_blocks+0x93/0x93
[ 622.222542] [] ? drop_buffers+0x96/0x96
[ 622.228452] [] block_write_full_page_endio+0x89/0x95
[ 622.235593] [] block_write_full_page+0x15/0x17
[ 622.242034] [] blkdev_writepage+0x18/0x1a
[ 622.248060] [] writepage+0x14/0x2d
[ 622.253732] [] ? page_index+0x1a/0x1a
[ 622.259396] [] write_cache_pages+0x22e/0x366
[ 622.265710] [] ? page_index+0x1a/0x1a
[ 622.271408] [] generic_writepages+0x3e/0x58
[ 622.277606] [] do_writepages+0x1e/0x2b
[ 622.283350] [] writeback_single_inode.isra.31+0x4c/0x123
[ 622.290891] [] writeback_sb_inodes+0x1d3/0x310
[ 622.297388] [] writeback_inodes_wb+0x74/0xb9
[ 622.303833] [] wb_writeback+0x136/0x26c
[ 622.309663] [] ? global_dirty_limits+0x2f/0x10e
[ 622.316218] [] wb_do_writeback+0x185/0x1bb
[ 622.322283] [] bdi_writeback_thread+0xa5/0x1ce
[ 622.328859] [] ? wb_do_writeback+0x1bb/0x1bb
[ 622.335219] [] kthread+0x8b/0x93
[ 622.340398] [] kernel_thread_helper+0x4/0x10
[ 622.346784] [] ? retint_restore_args+0x13/0x13
[ 622.353261] [] ? kthread_worker_fn+0x149/0x149
[ 622.359759] [] ? gs_change+0x13/0x13
[ 622.365390] INFO: task blkid:4280 blocked for more than 20 seconds.
[ 622.372076] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 622.380445] blkid D ffff8806313f03a0 5720 4280 4221 0x00000000
[ 622.387989] ffff8804dcd37ac8 0000000000000082 ffff8806313f0000 ffff8804dcd37fd8
[ 622.396128] ffff8804dcd37fd8 0000000000012780 ffff880630e38000 ffff8806313f0000
[ 622.404338] ffffffff810608f2 ffff88062f038918 ffff8806313f0000 ffff88062f03891c
[ 622.412436] Call Trace:
[ 622.415116] [] ? need_resched+0x11/0x1d
[ 622.420913] [] schedule+0x65/0x67
[ 622.426238] [] schedule_preempt_disabled+0xe/0x10
[ 622.433118] [] mutex_lock_common.isra.7+0x14a/0x166
[ 622.440210] [] __mutex_lock_slowpath+0x13/0x15
[ 622.446826] [] mutex_lock+0x18/0x29
[ 622.452350] [] blkdev_get+0x9c/0x3da
[ 622.458013] [] ? blkdev_get+0x2ce/0x2ce
[ 622.463767] [] blkdev_get+0x189/0x2ce
[ 622.469289] [] ? find_get_page+0x4a/0x6a
[ 622.475095] [] ? d_lookup_rcu+0xa2/0xc9
[ 622.480979] [] ? blkdev_get+0x2ce/0x2ce
[ 622.486742] [] blkdev_open+0x64/0x70
[ 622.492188] [] do_dentry_open.isra.17+0x16e/0x21d
[ 622.498880] [] nameidata_to_filp+0x42/0x84
[ 622.504952] [] do_last.isra.47+0x625/0x64b
[ 622.510996] [] path_openat+0xc5/0x2f4
[ 622.516569] [] do_filp_open+0x38/0x86
[ 622.522157] [] ? getname_flags+0x2a/0xa2
[ 622.528120] [] ? alloc_fd+0xe5/0xf7
[ 622.533606] [] do_sys_open+0x6e/0x102
[ 622.539253] [] sys_open+0x21/0x23
[ 622.544665] [] system_call_fastpath+0x16/0x1b
[ 642.511953] INFO: task flush-230:0:4232 blocked for more than 20 seconds.
[ 642.519226] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.] schedule+0x65/0x67] io_schedule+0x60/0x7a] get_request_wait+0xbd/0x166
[ 642.579492] [] ? cfq_merge+0x72/0xa1
[ 642.585158] [] ? abort_exclusive_wait+0x8f/0x8f
[ 642.591849] [] blk_queue_bio+0x193/0x2d6
[ 642.597774] [] generic_make_request+0x9c/0xdd
[ 642.604279] [] submit_bio+0xbb/0xd4
[ 642.609739] [] ? inc_zone_page_state+0x27/0x29
[ 642.616262] [] submit_bh+0xf6/0x116
[ 642.621806] [] __block_write_full_page+0x200/0x2fd
[ 642.628676] [] ? blkdev_get_blocks+0x93/0x93
[ 642.635033] [] ? drop_buffers+0x96/0x96
[ 642.640850] [] ? blkdev_get_blocks+0x93/0x93
[ 642.647224] [] ? drop_buffers+0x96/0x96
[ 642.653218] [] block_write_full_page_endio+0x89/0x95
[ 642.660238] [] block_write_full_page+0x15/0x17
[ 642.666827] [] blkdev_writepage+0x18/0x1a
[ 642.672929] [] writepage+0x14/0x2d
[ 642.678471] [] ? page_index+0x1a/0x1a
[ 642.684215] [] write_cache_pages+0x22e/0x366
[ 642.690480] [] ? page_index+0x1a/0x1a
[ 642.696224] [] generic_writepages+0x3e/0x58
[ 642.702510] [] do_writepages+0x1e/0x2b
[ 642.708236] [] writeback_single_inode.isra.31+0x4c/0x123
[ 642.715938] [] writeback_sb_inodes+0x1d3/0x310
[ 642.722505] [] writeback_inodes_wb+0x74/0xb9
[ 642.728979] [] wb_writeback+0x136/0x26c
[ 642.734905] [] ? global_dirty_limits+0x2f/0x10e
[ 642.741572] [] wb_do_writeback+0x185/0x1bb
[ 642.747656] [] bdi_writeback_thread+0xa5/0x1ce
[ 642.754191] [] ? wb_do_writeback+0x1bb/0x1bb
[ 642.760487] [] kthread+0x8b/0x93
[ 642.765725] [] kernel_thread_helper+0x4/0x10
[ 642.772058] [] ? retint_restore_args+0x13/0x13
[ 642.778559] [] ? kthread_worker_fn+0x149/0x149
[ 642.784956] [] ? gs_change+0x13/0x13
[ 642.790429] INFO: task blkid:4280 blocked for more than 20 seconds.
[ 642.796923] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 642.805005] blkid D ffff8806313f03a0 5720 4280 4221 0x00000000
[ 642.812464] ffff8804dcd37ac8 0000000000000082 ffff8806313f0000 ffff8804dcd37fd8
[ 642.820356] ffff8804dcd37fd8 0000000000012780 ffff880630e38000 ffff8806313f0000
[ 642.828250] ffffffff810608f2 ffff88062f038918 ffff8806313f0000 ffff88062f03891c
[ 642.836148] Call Trace:
[ 642.838712] [] ? need_resched+0x11/0x1d
[ 642.844313] [] schedule+0x65/0x67
[ 642.849405] [] schedule_preempt_disabled+0xe/0x10
[ 642.855865] [] mutex_lock_common.isra.7+0x14a/0x166
[ 642.862884] [] mutex_lock_slowpath+0x13/0x15
[ 642.869304] [] mutex_lock+0x18/0x29
[ 642.874805] [] __blkdev_get+0x9c/0x3da
[ 642.880574] [] ? blkdev_get+0x2ce/0x2ce
[ 642.886406] [] blkdev_get+0x189/0x2ce
[ 642.892103] [] ? find_get_page+0x4a/0x6a
[ 642.898052] [] ? __d_lookup_rcu+0xa2/0xc9
[ 642.904063] [] ? blkdev_get+0x2ce/0x2ce
[ 642.909908] [] blkdev_open+0x64/0x70
[ 642.915489] [] do_dentry_open.isra.17+0x16e/0x21d
[ 642.922308] [] nameidata_to_filp+0x42/0x84
[ 642.928422] [] do_last.isra.47+0x625/0x64b
[ 642.934490] [] path_openat+0xc5/0x2f4
[ 642.940187] [] do_filp_open+0x38/0x86
[ 642.945894] [] ? getname_flags+0x2a/0xa2
[ 642.951825] [] ? alloc_fd+0xe5/0xf7
[ 642.957334] [] do_sys_open+0x6e/0x102
[ 642.962969] [] sys_open+0x21/0x23
[ 642.968315] [] system_call_fastpath+0x16/0x1b
[ 642.527644] flush-230:0 D ffff8803318503a0 4224 4232 2 0x00000000
[ 642.535269] ffff88060bda37d0 0000000000000046 ffff880331850000 ffff88060bda3fd8
[ 642.543417] ffff88060bda3fd8 0000000000012780 ffffffff817633f0 ffff880331850000
[ 642.551527] ffff88060bda37a0 ffff880333c12780 ffff880331850000 ffff8802c88aec00
[ 642.559693] Call Trace:
[ 642.562439] [
[ 642.567711] [
[ 642.573354] [