openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.33k stars 1.72k forks source link

System hang when running zpool if using zvols and CONFIG_PREEMPT_NONE=y #1574

Closed chrekh closed 10 years ago

chrekh commented 11 years ago

I am running Gentoo Linux with kernel 3.9.9

At boot when running first zpool command (from init-script, or from commandline when booting singleuser) the system hangs totally.

Richard Yao Suggested that I should report this here. See also https://bugs.gentoo.org/show_bug.cgi?id=472516

I have bisected to commit: 526af78550eb5ccf80ce11e7a9c26f203ae671b0 I have also verified that reverting 526af78550eb5ccf80ce11e7a9c26f203ae671b0 on top of 168d056 helps.

ryao commented 11 years ago

Do you mean that reverting 526af78550eb5ccf80ce11e7a9c26f203ae671b0 makes the issue go away or does it make it less frequent?

behlendorf commented 11 years ago

Are you sure it's hung. The first command will trigger the pool import which may take some time depending on your pool. Prior to the change you referenced that import could occur earlier in the boot and the time spent may have been less noticeable.

chrekh commented 11 years ago

@ryao, Reverting 526af78550eb5ccf80ce11e7a9c26f203ae671b0 makes the issue go away. No hang at all.

@behlendorf, I am now ;) I let it sit for 15 minutes before giving up, and pressed [sysrq]-s, [sysrq]-u, [sysrq]-b

behlendorf commented 11 years ago

@chrekh Can you reproduce the issue and use sysrq-t to dump the kernel stacks on the system. We'll need them to be able to debug what causes the lockup.

chrekh commented 11 years ago

sysrq-t show absolutely nothing.

But when I experimented with sysrq I accidently pressed sysrq-i, and to my surprise that resulted in the boot-sequence continued as if I exited single-user shell with ^D.

I investigated further. If I list processes with ps (before running zpool) I found a suspicious udev process.

/lib/udev/zvol_id /dev/zd0

So I went ahead and removed udev from sysinit runlevel and booted again. This time there is no udev process, and lsmod showed that zfs was not loaded. If I then load zfs (modprobe zfs) zpool is ok, but shows pool0 as degraded. I then started udev (/etc/init.d/udev start). Then zpool show pool0 as online, and I can exit the single-user shell and boot completely.

behlendorf commented 11 years ago

@chrekh It would be very helpful to figure out where that zvol_id process is blocked. You should be able to get the stack for it with sysrq-t, the output will appear in dmesg.

chrekh commented 11 years ago

There is no output from sysrq-t (except the header "Sysr1 : Show State"), and at that point the system is completely hang, and I can't run any command including dmesg.

I also noticed that I can get the hang from just booting without udev enabled in sysinit runlevel, load the zfs module, and run /etc/init.d/udev before zpool.

behlendorf commented 11 years ago

I also noticed that I can get the hang from just booting without udev enabled in sysinit runlevel, load the zfs module, and run /etc/init.d/udev before zpool.

@chrekh That's helpful. Perhaps I'll be able to reproduce the same issue using that method. In the meanwhile can you apply the patch in #1491 and see if that resolves the issue.

chrekh commented 11 years ago

I have tried #1491 applied on top of 50fe577d1f3bd06e15fe2006459debd9fdffd04a and that didn't help.

chrekh commented 11 years ago

I have now tested with latest master of spl and zfs with kernel 3.10.0 It has the same problem.

chrekh commented 11 years ago

Good news. I figured out why sysrq-t didn't show anything by actually reading Documentation/sysrq.txt and did echo 8 > /proc/sysrq-trigger

The next challenge was to save the output, so I did

(while true; do dmesg > /dmesg; sleep 1; done)&

This is the result, I think I have missed data from the top, I can try agan with higer LOG_BUF_SHIFT (I have 17 now). Let me know if you find anything useful here (this garbage don't mean anything to me) ;)

Oh, and I couldn't do this comment with the text pasted, probably to large.

I have put it temporary at http://www.chrekh.se/tmp/dmesg.txt (what is the correct way to do it?)

behlendorf commented 11 years ago

Here's the good bit from the stacks you dumped. I cleaned up it slightly and I'm still a bit confused by what I see. What exactly is your pool configuration? Are there multiple pools? Are these all block devices? Also can you try applying this patch and see if it helps ryao/zfs@7df77878bb18bd8091c2630e530e2fe69fbc64a1.

vdev_open/0     D ffff88041f9f67c8     0   889      2 
Call Trace:
 [<ffffffffa000c2ed>] ? taskq_wait_all+0x6d/0x110 [spl]
 [<ffffffff81084720>] ? finish_wait+0x90/0x90
 [<ffffffffa000c49e>] ? taskq_destroy+0x2e/0x420 [spl]
 [<ffffffffa00f2734>] ? vdev_config_sync+0xae4/0xc00 [zfs]
 [<ffffffffa00ed1a8>] ? vdev_open+0xf8/0x480 [zfs]
 [<ffffffffa00eded9>] ? vdev_open_children+0x109/0x120 [zfs]
 [<ffffffff81084113>] ? kthread+0xb3/0xc0

vdev_open/0     D ffff88041f9f6e78     0   890      2
Call Trace:
 [<ffffffff814275b8>] ? io_schedule+0x88/0xd0
 [<ffffffffa000fc56>] ? __cv_timedwait+0x96/0x110 [spl]
 [<ffffffffa012ad2b>] ? zio_wait+0xeb/0x180 [zfs]
 [<ffffffffa00ed357>] ? vdev_open+0x2a7/0x480 [zfs]
 [<ffffffffa00eded9>] ? vdev_open_children+0x109/0x120 [zfs]
 [<ffffffff81084113>] ? kthread+0xb3/0xc0

vdev_open/1     D ffff88041f9f7528     0   891      2
Call Trace:
 [<ffffffff814275b8>] ? io_schedule+0x88/0xd0
 [<ffffffffa000fc56>] ? __cv_timedwait+0x96/0x110 [spl]
 [<ffffffffa012ad2b>] ? zio_wait+0xeb/0x180 [zfs]
 [<ffffffffa00ed357>] ? vdev_open+0x2a7/0x480 [zfs]
 [<ffffffffa00eded9>] ? vdev_open_children+0x109/0x120 [zfs]
 [<ffffffff81084113>] ? kthread+0xb3/0xc0

zvol_id         R  running task        0   892    446 
Call Trace:
 [<ffffffff811ba82c>] ? exact_lock+0xc/0x20
 [<ffffffff81262b34>] ? kobj_lookup+0xd4/0x150
 [<ffffffff811ba430>] ? disk_map_sector_rcu+0x70/0x70
 [<ffffffff811bac55>] ? get_gendisk+0x35/0x140
 [<ffffffff8113c01a>] ? __blkdev_get+0x11a/0x420
 [<ffffffff8113c5e0>] ? blkdev_get+0x2c0/0x2c0
 [<ffffffff8113c4a6>] ? blkdev_get+0x186/0x2c0
 [<ffffffff8113c5e0>] ? blkdev_get+0x2c0/0x2c0
 [<ffffffff81109396>] ? do_dentry_open+0x216/0x2a0
 [<ffffffff81109448>] ? finish_open+0x28/0x40
 [<ffffffff811186ca>] ? do_last.isra.61+0x30a/0xc90
 [<ffffffff81115938>] ? link_path_walk+0x68/0x850
 [<ffffffff81116685>] ? path_lookupat+0x65/0x700
 [<ffffffff8111910b>] ? path_openat.isra.62+0xbb/0x470
 [<ffffffff81119657>] ? user_path_at_empty+0x67/0xb0
 [<ffffffff811196f4>] ? do_filp_open+0x44/0xb0
 [<ffffffff81125322>] ? __alloc_fd+0x42/0x110
 [<ffffffff8110a693>] ? do_sys_open+0xf3/0x1e0
 [<ffffffff81428ad2>] ? system_call_fastpath+0x16/0x1b 
chrekh commented 11 years ago

Yes I have two pools, all devs are real blockdevs.

sudo zpool list -v NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT pool0 208G 28,8G 179G 13% 1.00x ONLINE - ata-OCZ-VERTEX2_OCZ-040YQK1O751847OJ-part4 104G 16,1G 87,9G - ata-OCZ-VERTEX2_OCZ-4IF8F7793P588C3G-part4 104G 12,7G 91,3G - pool1 696G 420G 276G 60% 1.10x ONLINE - mirror 696G 420G 276G - sdd - - - - sdc - - - -

The zvol is in pool0

sudo zpool get all pool0 NAME PROPERTY VALUE SOURCE pool0 size 208G - pool0 capacity 13% - pool0 altroot - default pool0 health ONLINE - pool0 guid 3853985073687055218 default pool0 version - default pool0 bootfs - default pool0 delegation on default pool0 autoreplace off default pool0 cachefile - default pool0 failmode wait default pool0 listsnapshots off default pool0 autoexpand off default pool0 dedupditto 0 default pool0 dedupratio 1.00x - pool0 free 179G - pool0 allocated 28,8G - pool0 readonly off - pool0 ashift 0 default pool0 comment - default pool0 expandsize 0 - pool0 freeing 0 default pool0 feature@async_destroy enabled local pool0 feature@empty_bpobj active local pool0 feature@lz4_compress enabled local

LANG=C sudo zfs get all pool0/ibdata NAME PROPERTY VALUE SOURCE pool0/ibdata type volume - pool0/ibdata creation Sat Jun 8 18:01 2013 - pool0/ibdata used 851M - pool0/ibdata available 170G - pool0/ibdata referenced 193M - pool0/ibdata compressratio 1.00x - pool0/ibdata reservation none default pool0/ibdata volsize 800M local pool0/ibdata volblocksize 4K - pool0/ibdata checksum on default pool0/ibdata compression off default pool0/ibdata readonly off default pool0/ibdata copies 1 default pool0/ibdata refreservation 851M local pool0/ibdata primarycache all default pool0/ibdata secondarycache all default pool0/ibdata usedbysnapshots 0 - pool0/ibdata usedbydataset 193M - pool0/ibdata usedbychildren 0 - pool0/ibdata usedbyrefreservation 658M - pool0/ibdata logbias latency default pool0/ibdata dedup off inherited from pool0 pool0/ibdata mlslabel none default pool0/ibdata sync standard default pool0/ibdata refcompressratio 1.00x - pool0/ibdata written 193M - pool0/ibdata snapdev hidden default

And, no. ryao/zfs@7df7787 did not help.

xudonax commented 11 years ago

I seem to be getting the same error in Fedora 18. One of many stacktraces I get is this, which I got after booting without ZFS installed, and then installing it by "sudo yum install zfs" (I had the repo still active).

The Fedora 18 kernel configuration (for kernel 3.9.9) says that:

This is one of them converted to text (I took photos). The image for this one is "2013-07-12-0391.jpg".

[ 4023.643608 ] ------------[ cut here ]------------
[ 4023.644117 ] kernel BUG at kernel/timer.c:1081!
[ 4023.644712 ] invalid opcode: 0000 [#1] SMP
[ 4023.645309 ] Models linked in: ip6table_filter ip6_tables ebtable_nat ebtables xt_CHECKSUM iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack bridge stp llc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser bnep bluetooth rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi mptctl mptbase zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) arc4 snd_hda_codec_hdmi snd_hda_codec_realtek spl(OF) zlib_deflate iTCO_wdt iTCO_vendor_support iwldvm mac80211 snd_hda_intel snd_hda_codec snd_hwdep acpi_cpufreq mperf snd_seq iwlwifi coretemp i915 snd_seq_device microcode cfg80211 serio_raw i2c_i801 snd_pcm rfkill snd_page_alloc snd_timer lpc_ich i2c_algo_bit mfd_core snd drm_kms_helper soundcore mei drm vhost_net tun macvtap macvlan kvm_intel nfsd kvm i2c_dev i2c_core auth_rpcgss nfs_acl lockd uinput crc32_pclmul crc32c_intel ghash_clmulni_intel firewire_ohci e1000e firewire_core mpt2sas crc_itu_t ptp raid_class pps_core scsi_transport_sas video sunrpc
[ 4023.776866 ] CPU 4
[ 4023.777138 ] Pid: 29, comm: ksoftirqd/4 Tainted: PF      W 0 3.9.9-201.fc18.x86_64 #1 /DH77DF
[ 4023.840442 ] RIP: 0010:[<ffffffff8106eb93>] [<ffffffff8106eb93>] cascade+0x93/0xa0
[ 4023.873765 ] RSP: 0018:ffff880408941d28 EFLAGS: 00010086
[ 4023.903637 ] RAX: 0000000000000000 RBX: ffff88004ff76910 RCX: ffff880078c9fa70
[ 4023.933720 ] RDX: 0000000000000023 RSI: ffff8802a3bedd00 RDI: ffff880408950000
[ 4023.962762 ] RBP: ffff880408941d58 R08: ffff880408950e28 R09: 0000000000000000
[ 4023.993460 ] R10: 0000000000000001 R11: 0000000000000001 R12: ffff880408950000
[ 4024.024750 ] R13: ffff880408941d28 R14: 0000000000000023 R15: 0000000000000000
[ 4024.087062 ] FS:  0000000000000000(0000) GS:ffff88041f300000(0000) knlGS:0000000000000000
[ 4024.118337 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4024.151045 ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4024.182855 ] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 4024.214525 ] Process ksoftirqd/4 (pid: 29, threadinfo ffff880408940000, task ffff880408938000)
[ 4024.247573 ] Stack:
[ 4024.281051 ]  ffff8802a3bedd00 ffff88006f2a2910 ffff880408950000 0000000000000000
[ 4024.315430 ]  0000000000000001 0000000000000001 ffff880408941dd8 ffffffff8106f838
[ 4024.350296 ]  ffff880408951c28 ffff880408951828 ffff880408951428 ffff880408951028
[ 4024.384432 ] Call Trace:
[ 4024.417541 ]  [<ffffffff8106f838>] run_timer_softirq+0x238/0x2a0
[ 4024.452017 ]  [<ffffffff81067678>] __do_softirq+0xe8/0x230
[ 4024.486796 ]  [<ffffffff810677f0>] run_ksoftirqd+0x30/0x50
[ 4024.521714 ]  [<ffffffff8108a8ef>] smpboot_thread_fn+0x10f/0x1a0
[ 4024.555204 ]  [<ffffffff8108a7e0>] ? lg_global_unlock+0x60/0x60
[ 4024.589396 ]  [<ffffffff81082ba0>] kthread+0xc0/0xd0
[ 4024.623318 ]  [<ffffffff81010303>] ? perf_trace_xen_cup_set_ldt+0x33/0xe0
[ 4024.657469 ]  [<ffffffff81082ae0>] ? kthread_create_on_node+0x120/0x120
[ 4024.692435 ]  [<ffffffff8166af2c>] ret_from_fork+0x7c/0xb0
[ 4024.727537 ]  [<ffffffff81082ae0>] ? kthread_create_on_node+0x120/0x120
[ 4024.761355 ] Code: 89 c3 48 83 e1 fc 49 39 cc 75 20 4c 89 e7 e8 35 f8 ff ff 4c 39 eb 48 8b 03 75 dd 48 83 c4 10 44 89 f0 5b 41 5c 41 5d 41 5e 5d c3 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89
[ 4024.835600 ] RIP [<ffffffff8106eb93>] cascade+0x93/0xa0
[ 4024.869732 ]  RSP <ffff880408941d28>
[ 4024.908356 ] ---[ end trace d2ec33da37bb2b1e ]---
[ 4024.947212 ] Kernel panic - not syncing: Fatal exception in interrupt
[ 4026.057188 ] Shutting down cups with NMI

This and the other readable photo are attached to this comment.

I have two pools, with this configuration: Pool tank: 6x Hitachi Travelstar 5K1000 in RAIDZ2 Pool fcpx: 3x 500GB 3.5" disk (different brands and types) in RAIDZ

2013-07-12-0391 2013-07-14-0404

ryao commented 11 years ago

@xudonax Are you running HEAD or 0.6.1?

xudonax commented 11 years ago

@ryao This was while running 0.6.1. Should this be fixed in HEAD? If so, I'm going to look into upgrading to that :)

chrekh commented 11 years ago

I have now tested to boot with pool1 exported. And then there is no problem. I get the hang only whith two pools active.

@xudonax I guess @ryao asks because my problem exists only after 0.6.1 0.6.1 works for me.

xudonax commented 11 years ago

Please ignore my mumbling in this thread, I'll open a new bug. The issue seems to be access to a ZVOL which is crashing my machine. I've been able to reproduce this a few times before figuring out the cause :)

ryao commented 10 years ago

@chrekh If possible, would you rebuild your kernel with frame pointers and get fresh stack straces? I recently learned that the stack traces become more detailed on kernels built with frame pointers. That should give us more useful information to use in debugging.

chrekh commented 10 years ago

I'm away from home for a week. I'll try that as soon as I'm home again

chrekh commented 10 years ago

I'm home again.

This is the result from sysrq-t on kernel 3.9.11 with CONFIG_FRAME_POINTER=y, at commit cb79a4e

vdev_open/0     D ffff88042f211380     0   811      2 0x00000000
 ffff88041f1d9c58 0000000000000046 ffff88041f5bc990 ffff88041f5bd040
 ffff88041f1d9c78 ffff88041f5bc990 ffff88041f1d9fd8 ffff88041f1d9fd8
 ffff88041f1d9fd8 ffff88041f5bc990 ffff88041f1d9c68 ffff88042c657f00
Call Trace:
 [<ffffffff81435514>] schedule+0x24/0x70
 [<ffffffffa09a0425>] taskq_wait_all+0x75/0x110 [spl]
 [<ffffffff81085dc0>] ? finish_wait+0x80/0x80
 [<ffffffffa09a059a>] taskq_wait+0x3a/0x50 [spl]
 [<ffffffffa09a05e7>] taskq_destroy+0x37/0x440 [spl]
 [<ffffffffa0b0d50a>] vdev_open_children+0xca/0x130 [zfs]
 [<ffffffffa0b12051>] vdev_config_sync+0xc31/0xd70 [zfs]
 [<ffffffffa0b0c7e5>] vdev_open+0xe5/0x470 [zfs]
 [<ffffffffa0b0d551>] vdev_open_children+0x111/0x130 [zfs]
 [<ffffffffa09a084b>] taskq_destroy+0x29b/0x440 [spl]
 [<ffffffff81091810>] ? try_to_wake_up+0x2d0/0x2d0
 [<ffffffffa09a06a0>] ? taskq_destroy+0xf0/0x440 [spl]
 [<ffffffff8108579b>] kthread+0xbb/0xc0
 [<ffffffff810856e0>] ? kthread_create_on_node+0x120/0x120
 [<ffffffff81436b2c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810856e0>] ? kthread_create_on_node+0x120/0x120
vdev_open/0     D ffff88042f211380     0   812      2 0x00000000
 ffff88041f1dbca8 0000000000000046 ffff8804249d3000 ffffffff81638440
 ffff88041f1dbc78 ffff88041f5bd040 ffff88041f1dbfd8 ffff88041f1dbfd8
 ffff88041f1dbfd8 ffff88041f5bd040 ffff88041f1dbca8 ffff88041f5bd040
Call Trace:
 [<ffffffff81435514>] schedule+0x24/0x70
 [<ffffffff814355ea>] io_schedule+0x8a/0xd0
 [<ffffffffa09a3d43>] __cv_timedwait+0x93/0x100 [spl]
 [<ffffffff81085dc0>] ? finish_wait+0x80/0x80
 [<ffffffffa09a3dc3>] __cv_wait_io+0x13/0x20 [spl]
 [<ffffffffa0b4a513>] zio_wait+0x103/0x1a0 [zfs]
 [<ffffffffa0b0c99a>] vdev_open+0x29a/0x470 [zfs]
 [<ffffffffa0b0d551>] vdev_open_children+0x111/0x130 [zfs]
 [<ffffffffa09a084b>] taskq_destroy+0x29b/0x440 [spl]
 [<ffffffff81091810>] ? try_to_wake_up+0x2d0/0x2d0
 [<ffffffffa09a06a0>] ? taskq_destroy+0xf0/0x440 [spl]
 [<ffffffff8108579b>] kthread+0xbb/0xc0
 [<ffffffff810856e0>] ? kthread_create_on_node+0x120/0x120
 [<ffffffff81436b2c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810856e0>] ? kthread_create_on_node+0x120/0x120
vdev_open/1     D ffff88042f231380     0   813      2 0x00000000
 ffff88041f1ddca8 0000000000000046 ffff8804249d3000 ffff88042d4942e0
 ffff88041f1ddc78 ffff88041f5bd6f0 ffff88041f1ddfd8 ffff88041f1ddfd8
 ffff88041f1ddfd8 ffff88041f5bd6f0 ffff88041f1ddca8 ffff88041f5bd6f0
Call Trace:
 [<ffffffff81435514>] schedule+0x24/0x70
 [<ffffffff814355ea>] io_schedule+0x8a/0xd0
 [<ffffffffa09a3d43>] __cv_timedwait+0x93/0x100 [spl]
 [<ffffffff81085dc0>] ? finish_wait+0x80/0x80
 [<ffffffffa09a3dc3>] __cv_wait_io+0x13/0x20 [spl]
 [<ffffffffa0b4a513>] zio_wait+0x103/0x1a0 [zfs]
 [<ffffffffa0b0c99a>] vdev_open+0x29a/0x470 [zfs]
 [<ffffffffa0b0d551>] vdev_open_children+0x111/0x130 [zfs]
 [<ffffffffa09a084b>] taskq_destroy+0x29b/0x440 [spl]
 [<ffffffff81091810>] ? try_to_wake_up+0x2d0/0x2d0
 [<ffffffffa09a06a0>] ? taskq_destroy+0xf0/0x440 [spl]
 [<ffffffff8108579b>] kthread+0xbb/0xc0
 [<ffffffff810856e0>] ? kthread_create_on_node+0x120/0x120
 [<ffffffff81436b2c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810856e0>] ? kthread_create_on_node+0x120/0x120
zvol_id         R  running task        0   814      1 0x0000000c
 0000000000000000 0000000000000000 ffff88042c6c3880 ffffffff811c02b0
 0000000000000007 0000000000000004 ffff88042c6c3880 ffffffffffffff6e
 ffffffff811cfb8b 0000000000000010 0000000000000202 ffff88041f28bb18
Call Trace:
 [<ffffffff811c02b0>] ? disk_map_sector_rcu+0x80/0x80
 [<ffffffff811cfb8b>] ? kobject_get+0x1b/0x40
 [<ffffffff811c0b66>] ? get_gendisk+0x36/0x140
 [<ffffffff812655b2>] ? get_device+0x12/0x30
 [<ffffffff811c0aa1>] ? disk_get_part+0x31/0x50
 [<ffffffff8113fdbf>] ? __blkdev_get+0x7f/0x420
 [<ffffffff811402e5>] ? blkdev_get+0x185/0x2d0
 [<ffffffff8114048a>] ? blkdev_open+0x5a/0x80
 [<ffffffff8110c603>] ? do_dentry_open+0x203/0x290
 [<ffffffff81140430>] ? blkdev_get+0x2d0/0x2d0
 [<ffffffff8110c6c0>] ? finish_open+0x30/0x40
 [<ffffffff8111baae>] ? do_last.isra.61+0x2be/0xc50
 [<ffffffff81118ca3>] ? inode_permission+0x13/0x50
 [<ffffffff81118d48>] ? link_path_walk+0x68/0x870
 [<ffffffff8111c4ee>] ? path_openat.isra.62+0xae/0x480
 [<ffffffff810e94a0>] ? handle_mm_fault+0x220/0x320
 [<ffffffff8111caec>] ? do_filp_open+0x3c/0x90
 [<ffffffff81128b22>] ? __alloc_fd+0x42/0x110
 [<ffffffff8110d93f>] ? do_sys_open+0xef/0x1d0
 [<ffffffff8110da3c>] ? sys_open+0x1c/0x20
 [<ffffffff81436bd2>] ? system_call_fastpath+0x16/0x1b
chrekh commented 10 years ago

This problem is now gone from master HEAD.

I did a reverse bisect and found that commit ba6a24026 is the one that fixed it.

behlendorf commented 10 years ago

Ahh yes, I can see how this might help. However, let's leave this open a little while. If I don't here back from you in a few weeks saying it can still happen we'll close it out. Thank for posting back in this issue.

chrekh commented 10 years ago

I think it's safe to close this now. I have continuously updated to master HEAD, and updaded the kernel. Now running 3.14.1 without problem.