Open snajpa opened 9 years ago
Actually the machine seems to be reading out SLOG by a look at iostat and zpool import's stack. So I'll forcefully remove even SLOG. I'm starting to dislike this.
[root@vz2.prg.relbit.com]
~ # ps aux | grep zpool
root 1593 1.6 0.0 133288 1776 pts/0 D+ 05:55 0:02 /sbin/zpool import -c /etc/zfs/zpool.cache -aN
root 4293 0.0 0.0 103252 892 pts/1 S+ 05:58 0:00 grep zpool
[root@vz2.prg.relbit.com]
~ # cat /proc/1593/stack
[<ffffffffa01e926b>] cv_wait_common+0xbb/0x190 [spl]
[<ffffffffa01e9358>] __cv_wait_io+0x18/0x20 [spl]
[<ffffffffa02f974b>] zio_wait+0x11b/0x220 [zfs]
[<ffffffffa02461d9>] arc_read+0x909/0xb00 [zfs]
[<ffffffffa02f3fa3>] zil_parse+0x2d3/0x850 [zfs]
[<ffffffffa02f45fd>] zil_check_log_chain+0xdd/0x1c0 [zfs]
[<ffffffffa025846b>] dmu_objset_find_impl+0xfb/0x400 [zfs]
[<ffffffffa025852a>] dmu_objset_find_impl+0x1ba/0x400 [zfs]
[<ffffffffa025852a>] dmu_objset_find_impl+0x1ba/0x400 [zfs]
[<ffffffffa02587c2>] dmu_objset_find+0x52/0x80 [zfs]
[<ffffffffa029cec5>] spa_load+0x1295/0x1a00 [zfs]
[<ffffffffa029d68e>] spa_load_best+0x5e/0x270 [zfs]
[<ffffffffa029e4cb>] spa_import+0x25b/0x800 [zfs]
[<ffffffffa02d3fc4>] zfs_ioc_pool_import+0xe4/0x120 [zfs]
[<ffffffffa02d683e>] zfsdev_ioctl+0x44e/0x480 [zfs]
[<ffffffff811ca6a2>] vfs_ioctl+0x22/0xa0
[<ffffffff811ca844>] do_vfs_ioctl+0x84/0x5b0
[<ffffffff811cadbf>] sys_ioctl+0x4f/0x80
[<ffffffff8100b102>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
Woohoo /o/ This is getting better and better \o\ /o/ During zpool import -m after removing SLOG:
[ 137.080345] VERIFY(rs == NULL) failed
[ 137.080454] PANIC at range_tree.c:186:range_tree_add()
[ 137.080559] Showing stack for process 8696
[ 137.080561] Pid: 8696, comm: z_wr_iss/0 veid: 0 Tainted: P --------------- 2.6.32-042stab104.1 #1
[ 137.080563] Call Trace:
[ 137.080579] [<ffffffffa01e706d>] ? spl_dumpstack+0x3d/0x40 [spl]
[ 137.080585] [<ffffffffa01e7262>] ? spl_panic+0xc2/0xe0 [spl]
[ 137.080590] [<ffffffffa01e255f>] ? spl_kmem_cache_alloc+0x7f/0x9c0 [spl]
[ 137.080613] [<ffffffffa024a4f6>] ? dbuf_rele_and_unlock+0x286/0x4c0 [zfs]
[ 137.080639] [<ffffffffa02f619b>] ? zio_destroy+0x7b/0x90 [zfs]
[ 137.080643] [<ffffffffa00e7190>] ? avl_find+0x60/0xb0 [zavl]
[ 137.080646] [<ffffffffa00e7190>] ? avl_find+0x60/0xb0 [zavl]
[ 137.080669] [<ffffffffa028d07f>] ? range_tree_add+0x8f/0x340 [zfs]
[ 137.080687] [<ffffffffa025735b>] ? dmu_read+0x12b/0x180 [zfs]
[ 137.080712] [<ffffffffa02a945c>] ? space_map_load+0x3fc/0x620 [zfs]
[ 137.080734] [<ffffffffa028a9a6>] ? metaslab_load+0x36/0xe0 [zfs]
[ 137.080757] [<ffffffffa028aae7>] ? metaslab_activate+0x57/0xa0 [zfs]
[ 137.080760] [<ffffffff81534a7e>] ? mutex_lock+0x1e/0x50
[ 137.080783] [<ffffffffa028b634>] ? metaslab_alloc+0x664/0xde0 [zfs]
[ 137.080788] [<ffffffffa01e255f>] ? spl_kmem_cache_alloc+0x7f/0x9c0 [spl]
[ 137.080813] [<ffffffffa02f9e6a>] ? zio_dva_allocate+0xaa/0x390 [zfs]
[ 137.080837] [<ffffffffa02f6668>] ? zio_push_transform+0x48/0xa0 [zfs]
[ 137.080861] [<ffffffffa02f7a41>] ? zio_write_bp_init+0x301/0x790 [zfs]
[ 137.080886] [<ffffffffa02f6f5c>] ? zio_taskq_member+0x7c/0xc0 [zfs]
[ 137.080910] [<ffffffffa02fc388>] ? zio_execute+0xd8/0x180 [zfs]
[ 137.080916] [<ffffffffa01e4f07>] ? taskq_thread+0x1e7/0x3f0 [spl]
[ 137.080919] [<ffffffff81065ba0>] ? default_wake_function+0x0/0x20
[ 137.080925] [<ffffffffa01e4d20>] ? taskq_thread+0x0/0x3f0 [spl]
[ 137.080928] [<ffffffff810a7cce>] ? kthread+0x9e/0xc0
[ 137.080931] [<ffffffff8100c34a>] ? child_rip+0xa/0x20
[ 137.080934] [<ffffffff810a7c30>] ? kthread+0x0/0xc0
[ 137.080936] [<ffffffff8100c340>] ? child_rip+0x0/0x20
Btw, have you ever SSH'd into your onboard IPMI so you could reboot it, because the web interface got stuck when entered a wrong password? Yay!
The machine is now powered off, backups were restored on another server. If anyone's interested, I'll gather more info/try some things, no problem.
I also had exactly the same problem... appears to have started with a bad stick of ram. Replaced RAM... but stuck importing/mounting pool with the same error as above. zdb with the following command reports the following additional information:
zdb -c pool
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
loading space map for vdev 0 of 4, metaslab 129 of 174 ...zdb: ../../module/zfs/range_tree.c:261: Assertion `rs->rs_start ,<= start (0x102338427000 <= 0x102338422000)' failed.
Aborted
'''
@myrond it definitely sounds like a damaged space map. Unfortunately, it's currently not possible to repair the pool but you should be able to safely import the pool read-only.
I solved the problem by importing read only and using zfs send across the network to a remote pool.
Not the preferred solution as I wish there was a option which said take pool and zero the free space map, instead of me effectively doing this the long way. On Oct 2, 2015 3:06 PM, "Brian Behlendorf" notifications@github.com wrote:
@myrond https://github.com/myrond it definitely sounds like a damaged space map. Unfortunately, it's currently not possible to repair the pool but you should be able to safely import the pool read-only.
— Reply to this email directly or view it on GitHub https://github.com/zfsonlinux/zfs/issues/3210#issuecomment-145178884.
@myrond I completely agree. I think we should provide a tool to reconstruct the space maps to recover from this kind of scenario. However, this kind of damage hasn't occurred frequently enough for us to make it a priority.
I have a broken space map, too. Is there some way to get the data of the filesystem?
root@ubuntu:~# zdb -b -e rpool
Traversing all blocks to verify nothing leaked ...
loading space map for vdev 0 of 1, metaslab 0 of 159 ...zdb: ../../module/zfs/range_tree.c:261: Assertion `rs->rs_start <= start (0xda000 <= 0<0)' failed.
Aborted
root@ubuntu:~#
Importing the pool always fails:
root@ubuntu:~# zpool import -N -f -R /mnt rpool
[ 1277.024109] VERIFY(rs == NULL) failed
[ 1277.024273] PANIC at range_tree.c:186:range_tree_add()
Importing it readonly works. I am glad, I found this issue.
In the end I found zfs send unreliable.
I brought up a bsd box and mounted the file system read-only.
I then remote mounted a file system.
I proceeded to copy all files off of the box... I found IO errors in some files. The errors showed up as null. I copied all of the errored files across and inserted padding where the errors actually were and marked them.
On Dec 29, 2017 4:27 AM, "Marcus Klein" notifications@github.com wrote:
I have a broken space map, too. Is there some way to get the data of the filesystem?
root@ubuntu:~# zdb -b -e rpool
Traversing all blocks to verify nothing leaked ...
loading space map for vdev 0 of 1, metaslab 0 of 159 ...zdb: ../../module/zfs/range_tree.c:261: Assertion `rs->rs_start <= start (0xda000 <= 0<0)' failed. Aborted root@ubuntu:~#
Importing the pool always fails:
root@ubuntu:~# zpool import -N -f -R /mnt rpool [ 1277.024109] VERIFY(rs == NULL) failed [ 1277.024273] PANIC at range_tree.c:186:range_tree_add()
Importing it readonly works. I am glad, I found this issue.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zfsonlinux/zfs/issues/3210#issuecomment-354434219, or mute the thread https://github.com/notifications/unsubscribe-auth/AAzxHKdyCtgIXSgiU4nv-EkG0vpVm625ks5tFMykgaJpZM4Dy2rA .
For me zfs send worked reliable. It is just a lot of work for every filesystem and volume with losing all their configuration. This works much nicer with a snapshot on all and sending recursively without losing configuration.
The pool was running on 0.6.3-stable, where it deadlocked and refused to import afterwards, every attempt to import it (even booting with init=/bin/bash, without udev, doing everything manually), has failed since. RHEL6 kernel, openvz 042stab104.1. Now I'm trying on ~close to master (https://github.com/vpsfreecz/zfs/commits/master) with no luck either.
I guess this might be something ARC related, since it shows some symptoms in arcstats, I would think that importing a pool isn't supposed to eat up whole arc_meta_limit. Might be l2arc related, since l2arc_feed and zpool import appear to be both locked. Unfortunately, I have no time to debug this as I need the machine to be running, so I will just remove the L2ARC partition and try to boot it without it.
Here's some more telling output info (my linked version of ZFS):