openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.53k stars 1.74k forks source link

Add 'spacemap rebuild' option to zhack #3210

Open snajpa opened 9 years ago

snajpa commented 9 years ago

The pool was running on 0.6.3-stable, where it deadlocked and refused to import afterwards, every attempt to import it (even booting with init=/bin/bash, without udev, doing everything manually), has failed since. RHEL6 kernel, openvz 042stab104.1. Now I'm trying on ~close to master (https://github.com/vpsfreecz/zfs/commits/master) with no luck either.

I guess this might be something ARC related, since it shows some symptoms in arcstats, I would think that importing a pool isn't supposed to eat up whole arc_meta_limit. Might be l2arc related, since l2arc_feed and zpool import appear to be both locked. Unfortunately, I have no time to debug this as I need the machine to be running, so I will just remove the L2ARC partition and try to boot it without it.

Here's some more telling output info (my linked version of ZFS):

[  360.794329] INFO: task l2arc_feed:574 blocked for more than 120 seconds.
[  360.794452]       Tainted: P           ---------------    2.6.32-042stab104.1 #1
[  360.794619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.794787] l2arc_feed    D ffff88060f850b80     0   574      2    0 0x00000000
[  360.794791]  ffff88060f853cd0 0000000000000046 0000000000000000 ffff88060f853c40
[  360.794794]  ffff88060f853cd0 ffffffff8153446a 000000298028e1dd 0000000000000286
[  360.794797]  0000000000000000 0000000000000000 00000000fffe230a 0000000300000001
[  360.794799] Call Trace:
[  360.794805]  [<ffffffff8153446a>] ? schedule_timeout+0x19a/0x2e0
[  360.794808]  [<ffffffff81534f66>] __mutex_lock_slowpath+0x96/0x210
[  360.794833]  [<ffffffffa02576e0>] ? l2arc_feed_thread+0x0/0x1090 [zfs]
[  360.794836]  [<ffffffff81534a8b>] mutex_lock+0x2b/0x50
[  360.794851]  [<ffffffffa02576e0>] ? l2arc_feed_thread+0x0/0x1090 [zfs]
[  360.794866]  [<ffffffffa0257853>] l2arc_feed_thread+0x173/0x1090 [zfs]
[  360.794870]  [<ffffffff81072e82>] ? enqueue_entity+0x52/0x270
[  360.794873]  [<ffffffff810607c6>] ? enqueue_task+0x66/0x80
[  360.794888]  [<ffffffffa02576e0>] ? l2arc_feed_thread+0x0/0x1090 [zfs]
[  360.794903]  [<ffffffffa02576e0>] ? l2arc_feed_thread+0x0/0x1090 [zfs]
[  360.794909]  [<ffffffffa01fc798>] thread_generic_wrapper+0x68/0x80 [spl]
[  360.794914]  [<ffffffffa01fc730>] ? thread_generic_wrapper+0x0/0x80 [spl]
[  360.794917]  [<ffffffff810a7cce>] kthread+0x9e/0xc0
[  360.794920]  [<ffffffff8100c34a>] child_rip+0xa/0x20
[  360.794923]  [<ffffffff810a7c30>] ? kthread+0x0/0xc0
[  360.794925]  [<ffffffff8100c340>] ? child_rip+0x0/0x20
[  360.794936] INFO: task zpool:1006 blocked for more than 120 seconds.
[  360.795038]       Tainted: P           ---------------    2.6.32-042stab104.1 #1
[  360.795193] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.795353] zpool         D ffff88060f1fe180     0  1006    991    0 0x00000004
[  360.795356]  ffff8805e4701cd8 0000000000000086 ffff88000003a600 ffff88061fc035c0
[  360.795359]  ffff8805e4701c88 0000000000000246 0000000000000286 ffff88061fc035c0
[  360.795362]  ffff8805e4701ca8 0000000000000018 0000000000000000 00000000000042d0
[  360.795364] Call Trace:
[  360.795369]  [<ffffffffa01f9cfc>] ? spl_kmem_alloc_debug+0x9c/0x1e0 [spl]
[  360.795371]  [<ffffffff81534f66>] __mutex_lock_slowpath+0x96/0x210
[  360.795378]  [<ffffffffa021d6d8>] ? nv_mem_zalloc+0x38/0x50 [znvpair]
[  360.795380]  [<ffffffff81534a8b>] mutex_lock+0x2b/0x50
[  360.795405]  [<ffffffffa02b86df>] spa_all_configs+0x4f/0x230 [zfs]
[  360.795430]  [<ffffffffa02ec6be>] zfs_ioc_pool_configs+0x2e/0x60 [zfs]
[  360.795454]  [<ffffffffa02ee83e>] zfsdev_ioctl+0x44e/0x480 [zfs]
[  360.795458]  [<ffffffff811ca6a2>] vfs_ioctl+0x22/0xa0
[  360.795460]  [<ffffffff812a6f4a>] ? strncpy_from_user+0x4a/0x90
[  360.795463]  [<ffffffff811ca844>] do_vfs_ioctl+0x84/0x5b0
[  360.795465]  [<ffffffff811c2dd5>] ? getname_flags+0x155/0x260
[  360.795468]  [<ffffffff811cadbf>] sys_ioctl+0x4f/0x80
[  360.795470]  [<ffffffff8100b102>] system_call_fastpath+0x16/0x1b
[  480.826511] INFO: task l2arc_feed:574 blocked for more than 120 seconds.
[  480.826636]       Tainted: P           ---------------    2.6.32-042stab104.1 #1
[  480.826801] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  480.826968] l2arc_feed    D ffff88060f850b80     0   574      2    0 0x00000000
[  480.826972]  ffff88060f853cd0 0000000000000046 0000000000000000 ffff88060f853c40
[  480.826975]  ffff88060f853cd0 ffffffff8153446a 000000298028e1dd 0000000000000286
[  480.826978]  0000000000000000 0000000000000000 00000000fffe230a 0000000300000001
[  480.826981] Call Trace:
[  480.826986]  [<ffffffff8153446a>] ? schedule_timeout+0x19a/0x2e0
[  480.826989]  [<ffffffff81534f66>] __mutex_lock_slowpath+0x96/0x210
[  480.827013]  [<ffffffffa02576e0>] ? l2arc_feed_thread+0x0/0x1090 [zfs]
[  480.827016]  [<ffffffff81534a8b>] mutex_lock+0x2b/0x50
[  480.827031]  [<ffffffffa02576e0>] ? l2arc_feed_thread+0x0/0x1090 [zfs]
[  480.827046]  [<ffffffffa0257853>] l2arc_feed_thread+0x173/0x1090 [zfs]
[  480.827050]  [<ffffffff81072e82>] ? enqueue_entity+0x52/0x270
[  480.827053]  [<ffffffff810607c6>] ? enqueue_task+0x66/0x80
[  480.827068]  [<ffffffffa02576e0>] ? l2arc_feed_thread+0x0/0x1090 [zfs]
[  480.827083]  [<ffffffffa02576e0>] ? l2arc_feed_thread+0x0/0x1090 [zfs]
[  480.827089]  [<ffffffffa01fc798>] thread_generic_wrapper+0x68/0x80 [spl]
[  480.827093]  [<ffffffffa01fc730>] ? thread_generic_wrapper+0x0/0x80 [spl]
[  480.827097]  [<ffffffff810a7cce>] kthread+0x9e/0xc0
[  480.827100]  [<ffffffff8100c34a>] child_rip+0xa/0x20
[  480.827102]  [<ffffffff810a7c30>] ? kthread+0x0/0xc0
[  480.827104]  [<ffffffff8100c340>] ? child_rip+0x0/0x20
[  480.827115] INFO: task zpool:1006 blocked for more than 120 seconds.
[  480.827223]       Tainted: P           ---------------    2.6.32-042stab104.1 #1
[  480.827390] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  480.827601] zpool         D ffff88060f1fe180     0  1006    991    0 0x00000004
[  480.827604]  ffff8805e4701cd8 0000000000000086 ffff88000003a600 ffff88061fc035c0
[  480.827606]  ffff8805e4701c88 0000000000000246 0000000000000286 ffff88061fc035c0
[  480.827609]  ffff8805e4701ca8 0000000000000018 0000000000000000 00000000000042d0
[  480.827612] Call Trace:
[  480.827616]  [<ffffffffa01f9cfc>] ? spl_kmem_alloc_debug+0x9c/0x1e0 [spl]
[  480.827619]  [<ffffffff81534f66>] __mutex_lock_slowpath+0x96/0x210
[  480.827625]  [<ffffffffa021d6d8>] ? nv_mem_zalloc+0x38/0x50 [znvpair]
[  480.827628]  [<ffffffff81534a8b>] mutex_lock+0x2b/0x50
[  480.827652]  [<ffffffffa02b86df>] spa_all_configs+0x4f/0x230 [zfs]
[  480.827677]  [<ffffffffa02ec6be>] zfs_ioc_pool_configs+0x2e/0x60 [zfs]
[  480.827701]  [<ffffffffa02ee83e>] zfsdev_ioctl+0x44e/0x480 [zfs]
[  480.827704]  [<ffffffff811ca6a2>] vfs_ioctl+0x22/0xa0
[  480.827707]  [<ffffffff812a6f4a>] ? strncpy_from_user+0x4a/0x90
[  480.827710]  [<ffffffff811ca844>] do_vfs_ioctl+0x84/0x5b0
[  480.827712]  [<ffffffff811c2dd5>] ? getname_flags+0x155/0x260
[  480.827715]  [<ffffffff811cadbf>] sys_ioctl+0x4f/0x80
[  480.827717]  [<ffffffff8100b102>] system_call_fastpath+0x16/0x1b
[  600.858742] INFO: task l2arc_feed:574 blocked for more than 120 seconds.
[  600.858865]       Tainted: P           ---------------    2.6.32-042stab104.1 #1
[  600.859028] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  600.859194] l2arc_feed    D ffff88060f850b80     0   574      2    0 0x00000000
[  600.859198]  ffff88060f853cd0 0000000000000046 0000000000000000 ffff88060f853c40
[  600.859201]  ffff88060f853cd0 ffffffff8153446a 000000298028e1dd 0000000000000286
[  600.859203]  0000000000000000 0000000000000000 00000000fffe230a 0000000300000001
[  600.859206] Call Trace:
[  600.859212]  [<ffffffff8153446a>] ? schedule_timeout+0x19a/0x2e0
[  600.859215]  [<ffffffff81534f66>] __mutex_lock_slowpath+0x96/0x210
[  600.859242]  [<ffffffffa02576e0>] ? l2arc_feed_thread+0x0/0x1090 [zfs]
[  600.859245]  [<ffffffff81534a8b>] mutex_lock+0x2b/0x50
[  600.859260]  [<ffffffffa02576e0>] ? l2arc_feed_thread+0x0/0x1090 [zfs]
[  600.859275]  [<ffffffffa0257853>] l2arc_feed_thread+0x173/0x1090 [zfs]
[  600.859279]  [<ffffffff81072e82>] ? enqueue_entity+0x52/0x270
[  600.859282]  [<ffffffff810607c6>] ? enqueue_task+0x66/0x80
[  600.859297]  [<ffffffffa02576e0>] ? l2arc_feed_thread+0x0/0x1090 [zfs]
[  600.859312]  [<ffffffffa02576e0>] ? l2arc_feed_thread+0x0/0x1090 [zfs]
[  600.859318]  [<ffffffffa01fc798>] thread_generic_wrapper+0x68/0x80 [spl]
[  600.859323]  [<ffffffffa01fc730>] ? thread_generic_wrapper+0x0/0x80 [spl]
[  600.859326]  [<ffffffff810a7cce>] kthread+0x9e/0xc0
[  600.859329]  [<ffffffff8100c34a>] child_rip+0xa/0x20
[  600.859332]  [<ffffffff810a7c30>] ? kthread+0x0/0xc0
[  600.859334]  [<ffffffff8100c340>] ? child_rip+0x0/0x20
[  600.859345] INFO: task zpool:1006 blocked for more than 120 seconds.
[  600.859447]       Tainted: P           ---------------    2.6.32-042stab104.1 #1
[  600.859603] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  600.859761] zpool         D ffff88060f1fe180     0  1006    991    0 0x00000004
[  600.859764]  ffff8805e4701cd8 0000000000000086 ffff88000003a600 ffff88061fc035c0
[  600.859767]  ffff8805e4701c88 0000000000000246 0000000000000286 ffff88061fc035c0
[  600.859769]  ffff8805e4701ca8 0000000000000018 0000000000000000 00000000000042d0
[  600.859772] Call Trace:
[  600.859776]  [<ffffffffa01f9cfc>] ? spl_kmem_alloc_debug+0x9c/0x1e0 [spl]
[  600.859779]  [<ffffffff81534f66>] __mutex_lock_slowpath+0x96/0x210
[  600.859788]  [<ffffffffa021d6d8>] ? nv_mem_zalloc+0x38/0x50 [znvpair]
[  600.859791]  [<ffffffff81534a8b>] mutex_lock+0x2b/0x50
[  600.859816]  [<ffffffffa02b86df>] spa_all_configs+0x4f/0x230 [zfs]
[  600.859841]  [<ffffffffa02ec6be>] zfs_ioc_pool_configs+0x2e/0x60 [zfs]
[  600.859865]  [<ffffffffa02ee83e>] zfsdev_ioctl+0x44e/0x480 [zfs]
[  600.859869]  [<ffffffff811ca6a2>] vfs_ioctl+0x22/0xa0
[  600.859872]  [<ffffffff812a6f4a>] ? strncpy_from_user+0x4a/0x90
[  600.859874]  [<ffffffff811ca844>] do_vfs_ioctl+0x84/0x5b0
[  600.859877]  [<ffffffff811c2dd5>] ? getname_flags+0x155/0x260
[  600.859880]  [<ffffffff811cadbf>] sys_ioctl+0x4f/0x80
[  600.859882]  [<ffffffff8100b102>] system_call_fastpath+0x16/0x1b
 ~ # cat /proc/spl/kstat/zfs/arcstats 
5 1 0x01 86 4128 150895819847 838491146303
name                            type data
hits                            4    4227
misses                          4    207320
demand_data_hits                4    0
demand_data_misses              4    0
demand_metadata_hits            4    4218
demand_metadata_misses          4    206761
prefetch_data_hits              4    0
prefetch_data_misses            4    0
prefetch_metadata_hits          4    9
prefetch_metadata_misses        4    559
mru_hits                        4    2280
mru_ghost_hits                  4    0
mfu_hits                        4    1938
mfu_ghost_hits                  4    0
deleted                         4    58585
recycle_miss                    4    6742
mutex_miss                      4    0
evict_skip                      4    29
evict_l2_cached                 4    0
evict_l2_eligible               4    508416
evict_l2_ineligible             4    8981895168
hash_elements                   4    148719
hash_elements_max               4    148890
hash_collisions                 4    3380
hash_chains                     4    2206
hash_chain_max                  4    2
p                               4    6283282432
c                               4    12566564864
c_min                           4    4194304
c_max                           4    12566564864
size                            4    9425016440
hdr_size                        4    57568776
data_size                       4    0
meta_size                       4    9366901248
other_size                      4    546416
anon_size                       4    131072
anon_evict_data                 4    0
anon_evict_metadata             4    0
mru_size                        4    9365983744
mru_evict_data                  4    0
mru_evict_metadata              4    9365934592
mru_ghost_size                  4    3203399680
mru_ghost_evict_data            4    0
mru_ghost_evict_metadata        4    3203399680
mfu_size                        4    786432
mfu_evict_data                  4    0
mfu_evict_metadata              4    592384
mfu_ghost_size                  4    0
mfu_ghost_evict_data            4    0
mfu_ghost_evict_metadata        4    0
l2_hits                         4    0
l2_misses                       4    207290
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_hdr_miss              4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    0
l2_cdata_free_on_write          4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
l2_compress_successes           4    0
l2_compress_zeros               4    0
l2_compress_failures            4    0
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    0
memory_direct_count             4    0
memory_indirect_count           4    0
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    9425016440
arc_meta_limit                  4    9424923648
arc_meta_max                    4    9425070072
snajpa commented 9 years ago

Actually the machine seems to be reading out SLOG by a look at iostat and zpool import's stack. So I'll forcefully remove even SLOG. I'm starting to dislike this.

[root@vz2.prg.relbit.com]
 ~ # ps aux | grep zpool
root        1593  1.6  0.0 133288  1776 pts/0    D+   05:55   0:02 /sbin/zpool import -c /etc/zfs/zpool.cache -aN
root        4293  0.0  0.0 103252   892 pts/1    S+   05:58   0:00 grep zpool
[root@vz2.prg.relbit.com]
 ~ # cat /proc/1593/stack 
[<ffffffffa01e926b>] cv_wait_common+0xbb/0x190 [spl]
[<ffffffffa01e9358>] __cv_wait_io+0x18/0x20 [spl]
[<ffffffffa02f974b>] zio_wait+0x11b/0x220 [zfs]
[<ffffffffa02461d9>] arc_read+0x909/0xb00 [zfs]
[<ffffffffa02f3fa3>] zil_parse+0x2d3/0x850 [zfs]
[<ffffffffa02f45fd>] zil_check_log_chain+0xdd/0x1c0 [zfs]
[<ffffffffa025846b>] dmu_objset_find_impl+0xfb/0x400 [zfs]
[<ffffffffa025852a>] dmu_objset_find_impl+0x1ba/0x400 [zfs]
[<ffffffffa025852a>] dmu_objset_find_impl+0x1ba/0x400 [zfs]
[<ffffffffa02587c2>] dmu_objset_find+0x52/0x80 [zfs]
[<ffffffffa029cec5>] spa_load+0x1295/0x1a00 [zfs]
[<ffffffffa029d68e>] spa_load_best+0x5e/0x270 [zfs]
[<ffffffffa029e4cb>] spa_import+0x25b/0x800 [zfs]
[<ffffffffa02d3fc4>] zfs_ioc_pool_import+0xe4/0x120 [zfs]
[<ffffffffa02d683e>] zfsdev_ioctl+0x44e/0x480 [zfs]
[<ffffffff811ca6a2>] vfs_ioctl+0x22/0xa0
[<ffffffff811ca844>] do_vfs_ioctl+0x84/0x5b0
[<ffffffff811cadbf>] sys_ioctl+0x4f/0x80
[<ffffffff8100b102>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
snajpa commented 9 years ago

Woohoo /o/ This is getting better and better \o\ /o/ During zpool import -m after removing SLOG:

[  137.080345] VERIFY(rs == NULL) failed
[  137.080454] PANIC at range_tree.c:186:range_tree_add()
[  137.080559] Showing stack for process 8696
[  137.080561] Pid: 8696, comm: z_wr_iss/0 veid: 0 Tainted: P           ---------------    2.6.32-042stab104.1 #1
[  137.080563] Call Trace:
[  137.080579]  [<ffffffffa01e706d>] ? spl_dumpstack+0x3d/0x40 [spl]
[  137.080585]  [<ffffffffa01e7262>] ? spl_panic+0xc2/0xe0 [spl]
[  137.080590]  [<ffffffffa01e255f>] ? spl_kmem_cache_alloc+0x7f/0x9c0 [spl]
[  137.080613]  [<ffffffffa024a4f6>] ? dbuf_rele_and_unlock+0x286/0x4c0 [zfs]
[  137.080639]  [<ffffffffa02f619b>] ? zio_destroy+0x7b/0x90 [zfs]
[  137.080643]  [<ffffffffa00e7190>] ? avl_find+0x60/0xb0 [zavl]
[  137.080646]  [<ffffffffa00e7190>] ? avl_find+0x60/0xb0 [zavl]
[  137.080669]  [<ffffffffa028d07f>] ? range_tree_add+0x8f/0x340 [zfs]
[  137.080687]  [<ffffffffa025735b>] ? dmu_read+0x12b/0x180 [zfs]
[  137.080712]  [<ffffffffa02a945c>] ? space_map_load+0x3fc/0x620 [zfs]
[  137.080734]  [<ffffffffa028a9a6>] ? metaslab_load+0x36/0xe0 [zfs]
[  137.080757]  [<ffffffffa028aae7>] ? metaslab_activate+0x57/0xa0 [zfs]
[  137.080760]  [<ffffffff81534a7e>] ? mutex_lock+0x1e/0x50
[  137.080783]  [<ffffffffa028b634>] ? metaslab_alloc+0x664/0xde0 [zfs]
[  137.080788]  [<ffffffffa01e255f>] ? spl_kmem_cache_alloc+0x7f/0x9c0 [spl]
[  137.080813]  [<ffffffffa02f9e6a>] ? zio_dva_allocate+0xaa/0x390 [zfs]
[  137.080837]  [<ffffffffa02f6668>] ? zio_push_transform+0x48/0xa0 [zfs]
[  137.080861]  [<ffffffffa02f7a41>] ? zio_write_bp_init+0x301/0x790 [zfs]
[  137.080886]  [<ffffffffa02f6f5c>] ? zio_taskq_member+0x7c/0xc0 [zfs]
[  137.080910]  [<ffffffffa02fc388>] ? zio_execute+0xd8/0x180 [zfs]
[  137.080916]  [<ffffffffa01e4f07>] ? taskq_thread+0x1e7/0x3f0 [spl]
[  137.080919]  [<ffffffff81065ba0>] ? default_wake_function+0x0/0x20
[  137.080925]  [<ffffffffa01e4d20>] ? taskq_thread+0x0/0x3f0 [spl]
[  137.080928]  [<ffffffff810a7cce>] ? kthread+0x9e/0xc0
[  137.080931]  [<ffffffff8100c34a>] ? child_rip+0xa/0x20
[  137.080934]  [<ffffffff810a7c30>] ? kthread+0x0/0xc0
[  137.080936]  [<ffffffff8100c340>] ? child_rip+0x0/0x20
snajpa commented 9 years ago

Btw, have you ever SSH'd into your onboard IPMI so you could reboot it, because the web interface got stuck when entered a wrong password? Yay!

snajpa commented 9 years ago

The machine is now powered off, backups were restored on another server. If anyone's interested, I'll gather more info/try some things, no problem.

myrond commented 9 years ago

I also had exactly the same problem... appears to have started with a bad stick of ram. Replaced RAM... but stuck importing/mounting pool with the same error as above. zdb with the following command reports the following additional information:

zdb -c pool

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

loading space map for vdev 0 of 4, metaslab 129 of 174 ...zdb: ../../module/zfs/range_tree.c:261: Assertion `rs->rs_start ,<= start (0x102338427000 <= 0x102338422000)' failed.
Aborted
'''
behlendorf commented 9 years ago

@myrond it definitely sounds like a damaged space map. Unfortunately, it's currently not possible to repair the pool but you should be able to safely import the pool read-only.

myrond commented 9 years ago

I solved the problem by importing read only and using zfs send across the network to a remote pool.

Not the preferred solution as I wish there was a option which said take pool and zero the free space map, instead of me effectively doing this the long way. On Oct 2, 2015 3:06 PM, "Brian Behlendorf" notifications@github.com wrote:

@myrond https://github.com/myrond it definitely sounds like a damaged space map. Unfortunately, it's currently not possible to repair the pool but you should be able to safely import the pool read-only.

— Reply to this email directly or view it on GitHub https://github.com/zfsonlinux/zfs/issues/3210#issuecomment-145178884.

behlendorf commented 9 years ago

@myrond I completely agree. I think we should provide a tool to reconstruct the space maps to recover from this kind of scenario. However, this kind of damage hasn't occurred frequently enough for us to make it a priority.

kleini commented 6 years ago

I have a broken space map, too. Is there some way to get the data of the filesystem?

root@ubuntu:~# zdb -b -e rpool

Traversing all blocks to verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 0 of 159 ...zdb: ../../module/zfs/range_tree.c:261: Assertion `rs->rs_start <= start (0xda000 <= 0<0)' failed.
Aborted
root@ubuntu:~#

Importing the pool always fails:

root@ubuntu:~# zpool import -N -f -R /mnt rpool
[ 1277.024109] VERIFY(rs == NULL) failed
[ 1277.024273] PANIC at range_tree.c:186:range_tree_add()

Importing it readonly works. I am glad, I found this issue.

myrond commented 6 years ago

In the end I found zfs send unreliable.

I brought up a bsd box and mounted the file system read-only.

I then remote mounted a file system.

I proceeded to copy all files off of the box... I found IO errors in some files. The errors showed up as null. I copied all of the errored files across and inserted padding where the errors actually were and marked them.

On Dec 29, 2017 4:27 AM, "Marcus Klein" notifications@github.com wrote:

I have a broken space map, too. Is there some way to get the data of the filesystem?

root@ubuntu:~# zdb -b -e rpool

Traversing all blocks to verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 0 of 159 ...zdb: ../../module/zfs/range_tree.c:261: Assertion `rs->rs_start <= start (0xda000 <= 0<0)' failed. Aborted root@ubuntu:~#

Importing the pool always fails:

root@ubuntu:~# zpool import -N -f -R /mnt rpool [ 1277.024109] VERIFY(rs == NULL) failed [ 1277.024273] PANIC at range_tree.c:186:range_tree_add()

Importing it readonly works. I am glad, I found this issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zfsonlinux/zfs/issues/3210#issuecomment-354434219, or mute the thread https://github.com/notifications/unsubscribe-auth/AAzxHKdyCtgIXSgiU4nv-EkG0vpVm625ks5tFMykgaJpZM4Dy2rA .

kleini commented 6 years ago

For me zfs send worked reliable. It is just a lot of work for every filesystem and volume with losing all their configuration. This works much nicer with a snapshot on all and sending recursively without losing configuration.