With primarycache=all set on a FS, SPL throws a PANIC at line 558 of zfs-znode.c

devsk commented 13 years ago

I can't seem to find a way to attach a file here. I have a snap of the oops.

Note that setting primarycache=none works fine (system boots and runs fine) but performance is extremely (I really mean that) slow!

devsk commented 13 years ago

couldn't find any other means, so sent it as an attachment in email.

behlendorf commented 13 years ago

Yeah, it's a bare bones bug tracking system which lacks features like attachments. I'll just transcribe the useful bits in to the bug. The stack looks strange but we at least have the VERIFY() which is the critical bit.

VERIFY(0 == sa_buf_hold(zsb->z_os, obj, NULL, &db)) zfs_mknode dmu_tx_assign dsl_dataset_block_freeable zfs_create dbuf_rele_and_unlock zfs_onexit_cb_data arc_getbuf_func arc_read_nolock spl_kmem_cache_alloc zfs_obj_to_stats

devsk commented 13 years ago

I figured how to attach a serial console in guest to a file in host and get the console output out. So, here is the complete stack:

[ 6.744018] SPL: Loaded Solaris Porting Layer v0.6.0 [ 6.770479] SPLAT: Loaded Solaris Porting LAyer Tests v0.6.0 [ 6.790620] zavl: module license 'CDDL' taints kernel. [ 6.795119] Disabling lock debugging due to kernel taint [ 6.910557] ZFS: Loaded ZFS Filesystem v0.6.0 [ 11.417109] VERIFY(0 == sa_buf_hold(zsb->z_os, obj, NULL, &db)) failed [ 11.423853] SPLError: 3266:0:(zfs_znode.c:558:zfs_mknode()) SPL PANIC [ 11.427340] SPL: Showing stack for process 3266 [ 11.428551] Pid: 3266, comm: mount Tainted: P 2.6.37.1 #5 [ 11.430116] Call Trace: [ 11.430973] [] ? spl_debug_dumpstack+0x26/0x2f [spl] [ 11.432539] [] ? spl_debug_bug+0x7a/0xf0 [spl] [ 11.438587] [] ? zfs_mknode+0x1e5/0xaf7 [zfs] [ 11.441164] [] ? dmu_tx_assign+0x327/0x462 [zfs] [ 11.442854] [] ? dsl_dataset_block_freeable+0x34/0x40 [zfs] [ 11.444838] [] ? arc_getbuf_func+0x0/0x5a [zfs] [ 11.446636] [] ? zfs_create+0x41e/0x68b [zfs] [ 11.447866] [] ? dbuf_rele_and_unlock+0x135/0x495 [zfs] [ 11.452599] [] ? zfs_onexit_cb_data+0xd2e/0x100b [zfs] [ 11.466643] [] ? 0xffffffffa02eea9f [ 11.468655] [] ? arc_getbuf_func+0x0/0x5a [zfs] [ 11.470149] [] ? 0xffffffffa02eebad [ 11.471439] [] ? arc_getbuf_func+0x0/0x5a [zfs] [ 11.472934] [] ? arc_read_nolock+0x1ed/0x680 [zfs] [ 11.474323] [] ? spl_kmem_cache_alloc+0xa0/0x4c3 [spl] [ 11.475892] [] ? __mod_zone_page_state+0x41/0x4b [ 11.477448] [] ? zfs_obj_to_stats+0x3d2/0xb15 [zfs] [ 11.482561] [] ? zil_parse+0x3c5/0x4aa [zfs] [ 11.484082] [] ? zfs_obj_to_stats+0x123/0xb15 [zfs] [ 11.487800] [] ? zfs_obj_to_stats+0x2e4/0xb15 [zfs] [ 11.491541] [] ? zil_replay+0xab/0xe3 [zfs] [ 11.492981] [] ? zfs_register_callbacks+0x456/0x46e [zfs] [ 11.510964] [] ? zfs_domount+0x188/0x1ef [zfs] [ 11.512494] [] ? zpl_write_common+0x954/0x10f2 [zfs] [ 11.514172] [] ? zpl_write_common+0x95d/0x10f2 [zfs] [ 11.515714] [] ? mount_nodev+0x45/0x7e [ 11.517096] [] ? get_sb_nodev+0xe/0x31 [ 11.518455] [] ? zpl_write_common+0x94f/0x10f2 [zfs] [ 11.520036] [] ? vfs_kern_mount+0x8a/0x131 [ 11.521446] [] ? do_kern_mount+0x49/0xe9 [ 11.527200] [] ? do_mount+0x6b4/0x6ed [ 11.528609] [] ? memdup_user+0x31/0x56 [ 11.529967] [] ? sys_mount+0x88/0xca [ 11.531148] [] ? system_call_fastpath+0x16/0x1b [ 11.532682] SPL: Dumping log to /tmp/spl-log.1298170238.3266 [ 11.534105] SPL: Can't open /tmp/spl-log.1298170238.3266 for dump: -2

devsk commented 13 years ago

And github garbles it....:-)

devsk commented 13 years ago

I am hell bent on getting the full stack to you...:-D

Try this URL: http://paste.pocoo.org/show/341555/

devsk commented 13 years ago

Basically, my pool is toast now. I can't even import it because SPL PANICs as soon as I try to import it. So, I can't even disable the primarycache property.

behlendorf commented 13 years ago

I just started looking at this issue. Thanks for posting the full stack in spite of Github it helps shed a lot of light on what's going on. So the problem occurs during mount when it's attempting to replay the zil from the previous crash. I'll see if I can't run down exactly what's going wrong with the replay.

behlendorf commented 13 years ago

Just as a passing thought you could set 'zil_replay_disable=1' which should bypass the offending code and allow you to mount the dataset. This isn't a fix, and I'd like to fix it, but if you want to get the dataset back online I suspect it will work. Unfortunately, it looks like I haven't exported that as a module option just yet. You would need to add the following to the bottom of module/zfs/zil.c:70.

#if defined(_KERNEL) && defined(HAVE_SPL)
module_param(zil_replay_disable, int, 0644);
MODULE_PARM_DESC(zil_replay_disable, "Disable ZFS Intent Logging (ZIL) Relpay");
#endif

behlendorf commented 13 years ago

I see what's happening, this was a loose end I did quite get tied up up when I was restructuring the ZPL to use all Linux types. The ZIL replay code is not correctly passing the object id to be recreated to the zfs_mknode function, it's getting lost because of some xvattr code I had commented out. I can fix this but not tonight. If you'ld like to get your pool back online your best bet in the short term is disabling ZIL replay.

devsk commented 13 years ago

No hurries Brian. At least we now know what the issue is. I will wait for the proper fix.

devsk commented 13 years ago

Brian, I have some time this weekend. Do you have any patch that I can try?

behlendorf commented 13 years ago

Sorry, no patch for this issue just yet. It's on the top of my list of things to address next week. I did get several of the other issues you reported fixed in the spl/zfs master branches. If you've got a few minutes you could try that.

devsk commented 13 years ago

Actually, I get very little time over the weekend to devote to things I absolutely care about. Other than that, I end up working weekends as well, so no scope for much else...:(

IMO, the only way I can get more testing done is if we can make ZFS going on my laptop and I use it regularly. For that, this fix is a must.

behlendorf commented 13 years ago

I appreciate any time you can devote to this, I'll make sure I get this fixed next.

behlendorf commented 13 years ago

devsk, I'm working to fix this issue in the code (which is understood), but I'd really like to have a test case for this as well to verify the fix. I'm having trouble recreating a pool which is damaged like this. Do you happen to have a simple workload which causes this?

devsk commented 13 years ago

Just having the primarycache=all on the dataset was enough to trigger the crash during import. So, enable primarycache=all, export the pool and import the pool. Is that not producing a crash for you?

behlendorf commented 13 years ago

No, I'm unable to reproduce the crash this way. I'm quite sure the issue is that my zil doesn't contain any dirty data to be replayed so I never run the offending code. Your zil does have dirty data probably because your system likely crashed due to some different bug. I've been having a hard time crashing my VM in such a way that the zil still has dirty data to be replayed. I was hoping you might have stumbled on a simple way to do this. :)

devsk commented 13 years ago

Can you enable the Host Cache if you are using Virtualbox, write a lot of files (like cp -a /usr/lib/ /usr/lib.copy) and then reset the VM while its in the middle of it?

behlendorf commented 13 years ago

I've been using kvm for the virtualization, but my naive attempts to reproduce the issue haven't worked by crashing the VM. The trick is I need the crash need to occur while a synchronous create/write is in the zil, but before the current transaction group has been committed to disk. I guess I'll have to resort to instrumenting the code for now to crash at just the right time. While using it as your root filesystem I'm sure you had quite a few synchronous IOs outstanding which made it more likely for you.

I was just thinking this is in what would be common code for a zfs-fuse implementation. That would be a nice way to test this sort of thing, it could be like ztest but include the posix layer. :) Ahh, well.

behlendorf commented 13 years ago

OK, I was hoping to have this work mostly done by today but I didn't make it. Hopefully, I'll be able to push a development branch with the proposes fix on Wednesday.

behlendorf commented 13 years ago

Drop HAVE_XVATTR macros

When I began work on the Posix layer it immediately became clear to me that to integrate cleanly with the Linux VFS certain Solaris specific things would have to go. One of these things was to elimate as many Solaris specific types from the ZPL layer as possible. They would be replaced with their Linux equivalents. This would not only be good for performance, but for the general readability and health of the code. The Solaris and Linux VFS are different beasts and should be treated as such. Most of the code remains common for constructing transactions and such, but there are subtle and important differenced which need to be repsected.

This policy went quite for for certain types such as the vnode_t, and it initially seemed to be working out well for the vattr_t. There was a relatively small amount of related xvattr_t code I was forced to comment out with HAVE_XVATTR. But it didn't look that hard to come back soon and replace it all with a native Linux type.

However, after going doing this path with xvattr some distance it clear that this code was woven in the ZPL more deeply than I thought. In particular its hooks went very deep in to the ZPL replay code and replacing it would not be as easy as I originally thought.

Rather than continue persuing replacing and removing this code I've taken a step back and reevaluted things. This commit reverts many of my previous commits which removed xvattr related code. It restores much of the code to its original upstream state and now relies on improved xvattr_t support in the zfs package itself.

The result of this is that much of the code which I had commented out, which accidentally broke things like replay, is now back in place and working. However, there may be a small performance impact for getattr/setattr operations because they now require a translation from native Linux to Solaris types. For now that's a price I'm willing to pay. Once everything is completely functional we can revisting the issue of removing the vattr_t/xvattr_t types.

Closed by 5484965ab650a56871a62a3373859a7a36ddafcb

behlendorf commented 13 years ago

Devsk, I have pushed a fix for the zil replay issue described in this bug to the spl/zfs xvattr branches. If you get a few minutes could you try it before I merge it in to master. It passes my test cases but I'd like to make sure it works for you as well.

https://github.com/behlendorf/spl/tree/xvattr https://github.com/behlendorf/zfs/tree/xvattr

devsk commented 13 years ago

Brian, I am running a office deadline tonight. Earliest I can get to it is tomorrow. I will be glad to give it run! Thanks for providing the fix.

behlendorf commented 13 years ago

No problem, just whenever you get a chance. I just want to make sure it really does fix your issue.

devsk commented 13 years ago

While importing the old pool with latest from git:

[  162.865879] VERIFY(0 == sa_buf_hold(zsb->z_os, obj, NULL, &db)) failed
[  162.865892] SPLError: 7964:0:(zfs_znode.c:524:zfs_mknode()) SPL PANIC
[  162.865895] SPL: Showing stack for process 7964
[  162.865900] Pid: 7964, comm: mount Tainted: P            2.6.38-rc7 #2
[  162.865904] Call Trace:
[  162.865927]  [] ? spl_debug_dumpstack+0x26/0x2f [spl]
[  162.865935]  [] ? spl_debug_bug+0x7a/0xf0 [spl]
[  162.865957]  [] ? zfs_mknode+0x1e5/0xaf7 [zfs]
[  162.865969]  [] ? 0xffffffffa029eb03
[  162.865991]  [] ? dmu_tx_assign+0x36b/0x515 [zfs]
[  162.866014]  [] ? dsl_dataset_block_freeable+0x34/0x40 [zfs]
[  162.866033]  [] ? zfs_create+0x41b/0x688 [zfs]
[  162.866049]  [] ? dbuf_rele_and_unlock+0x131/0x491 [zfs]
[  162.866069]  [] ? zfs_onexit_cb_data+0xd2d/0x100a [zfs]
[  162.866081]  [] ? 0xffffffffa029e244
[  162.866093]  [] ? arc_getbuf_func+0x0/0x5a [zfs]
[  162.866104]  [] ? 0xffffffffa029eb03
[  162.866117]  [] ? arc_read_nolock+0x1e7/0x64a [zfs]
[  162.866124]  [] ? update_curr+0x5a/0xb0
[  162.866141]  [] ? zfs_obj_to_stats+0x3b6/0xb1f [zfs]
[  162.866158]  [] ? zil_parse+0x3b9/0x49e [zfs]
[  162.866174]  [] ? zfs_obj_to_stats+0x107/0xb1f [zfs]
[  162.866190]  [] ? zfs_obj_to_stats+0x2c8/0xb1f [zfs]
[  162.866208]  [] ? zil_replay+0xab/0xe3 [zfs]
[  162.866228]  [] ? zfs_register_callbacks+0x456/0x46e [zfs]
[  162.866246]  [] ? zfs_domount+0x188/0x1ef [zfs]
[  162.866259]  [] ? zpl_write_common+0x93c/0x10c3 [zfs]
[  162.866272]  [] ? zpl_write_common+0x945/0x10c3 [zfs]
[  162.866279]  [] ? mount_nodev+0x44/0x87
[  162.866285]  [] ? get_sb_nodev+0xe/0x2e
[  162.866299]  [] ? zpl_write_common+0x937/0x10c3 [zfs]
[  162.866305]  [] ? vfs_kern_mount+0x87/0x12e
[  162.866311]  [] ? do_kern_mount+0x49/0xd8
[  162.866317]  [] ? do_mount+0x6c3/0x731
[  162.866323]  [] ? sys_mount+0x88/0xca
[  162.866328]  [] ? system_call_fastpath+0x16/0x1b
[  162.866379] SPL: Dumping log to /tmp/spl-log.1299165554.7964

behlendorf commented 13 years ago

Interesting, you hit the same VERIFY(). That's makes me very suspicious that in fact the on-disk intent logs are damaged/incomplete. Before I reverted to the original xvattr code that was a possible. Can you run zdb to dump the intent log and some additional pool info while your at it.

zdb -ivv <pool>/<dataset>
zdb -vv <pool>

I have a hunch you might actually hit this same failure if you tried to import the pool using even OpenSolaris. Are you in any position to try that? It would be nice to try with zfs-fuse but I don't think they don't support up to pool version 28.

If the above is true, and you see the issue using a different implementation, I would suggest setting the 'zil_replay_disable' module option. Then you'll be able import and mount the pool, and if this was a bug in how the log were written you shouldn't be able to cause this again after updating to the new code. Alternately, if you want to keep the damaged pool around intact (it's a nice test case), you could just create a new pool with the new code and see how it goes.

devsk commented 13 years ago

yeah, zfs-fuse is out, it can't import this pool. I can try bring up my opensolaris install or give u the output of zdb to see if we can figure out something.

I will try disable the zil replay last.

devsk commented 13 years ago

zil_replay_disable is still not an option.

devsk commented 13 years ago

it looks like I pulled the wrong changes: git clone https://github.com/behlendorf/zfs.git zfs

This doesn't seem to have the above commits. So, did I get confused by the github again? When I was browsing the commits, whatever path was shown, that's what I used for pulling.

But it looks like I am looking at a completely different code.

Is there a way to confirm the internal source code version?

devsk commented 13 years ago

ok, grabbed the xvattr branch now and will retest. Sorry for the confusion.

devsk commented 13 years ago

Ok, no SPL panic anymore. I could import the pool and access my FS. I am syncing it up and I will give the ZFS rootfs boot another go soon.

behlendorf commented 13 years ago

Yay! I'll finish my testing of these changes and merge them in. I've got a pile of small issue people have reported I want to wrap up, then I'll make a new tag. I'll probably miss my self imposed deadline of Friday for this though, so expect a new tag early next week.

devsk commented 13 years ago

I have started using it as my rootfs. Let's see how far do we go!

openzfs / zfs

With primarycache=all set on a FS, SPL throws a PANIC at line 558 of zfs-znode.c #111