zfs: adding existent segment to range tree (offset=180fefe000 size=20000) // VERIFY3(rs_get_end(rs, rt) >= end) failed (103347642368 >= 103347757056)

samueldr commented 2 years ago

System information

Type	Version/Name
Distribution Name	NixOS
Distribution Version	"unstable" (rolling)
Kernel Version	5.18.19
Architecture	x86_64
OpenZFS Version	2.1.5-1

Describe the problem you're observing

(Excuse the lack of precision in the terminology used)

Corrupted single-disk pool cannot be imported anymore. Panics the kernel module, making the commands hang. Also crashes zdb under some invocations.

The disk was in a system that developed a fault. The system is not in use anymore. The observations are made on a healthy system. Assume the disk is still healthy. Everything points to the disk being still perfectly healthy. Meanwhile, its previous host system obviously had a fault unrelated to the disk.

Importing read-only works.

Using zdb -bcsvL -e big-storage (command suggested elsewhere) it spent time verifying everything and everything checked out.

Feel free to re-title the issue!

Describe how to reproduce the problem

Be unlucky and have broken hardware (not the disk) :(.

Include any warning/errors/backtraces from the system logs

Same information in a gist:

https://gist.github.com/samueldr/fa89b35e423da3564f5148d345ab0d6d

zdb crash

``` # zdb -vvvvvv -b -e big-storage Traversing all blocks to verify nothing leaked ... loading concrete vdev 0, metaslab 6 of 232 ...error: zfs: removing nonexistent segment from range tree (offset=180fefe000 size=20000) Aborted (core dumped) ```

zpool import (nothing special)

``` NOTE: command hangs, kernel panic shared after ~ # zpool import -f big-storage PANIC: zfs: adding existent segment to range tree (offset=180fefe000 size=20000) Showing stack for process 149045 CPU: 2 PID: 149045 Comm: z_wr_iss Tainted: P O 5.18.19 #1-NixOS Hardware name: ZOTAC XXXXXX/XXXXXX, BIOS 4.6.5 09/15/2015 Call Trace: dump_stack_lvl+0x45/0x5e vcmn_err.cold+0x50/0x68 [spl] ? zfs_btree_insert_core_impl.isra.0+0x76/0x90 [zfs] ? zfs_btree_insert_into_leaf+0x232/0x2a0 [zfs] ? zfs_btree_insert_into_leaf+0x232/0x2a0 [zfs] ? pn_free+0x30/0x30 [zfs] ? zfs_btree_find_parent_idx+0x72/0xd0 [zfs] zfs_panic_recover+0x6d/0x90 [zfs] range_tree_add_impl+0x303/0xe40 [zfs] ? zio_wait+0x260/0x290 [zfs] space_map_load_callback+0x55/0x90 [zfs] space_map_iterate+0x193/0x3d0 [zfs] ? spa_stats_destroy+0x190/0x190 [zfs] space_map_load_length+0x5e/0xe0 [zfs] metaslab_load+0x14d/0x8a0 [zfs] ? range_tree_add_impl+0x759/0xe40 [zfs] metaslab_activate+0x50/0x2b0 [zfs] ? preempt_count_add+0x70/0xa0 metaslab_alloc_dva+0x351/0x1490 [zfs] metaslab_alloc+0xd3/0x280 [zfs] zio_dva_allocate+0xd3/0x900 [zfs] ? __kmalloc_node+0x17b/0x370 ? preempt_count_add+0x70/0xa0 ? _raw_spin_lock+0x13/0x40 zio_execute+0x83/0x120 [zfs] taskq_thread+0x2cf/0x500 [spl] ? wake_up_q+0x90/0x90 ? zio_gang_tree_free+0x70/0x70 [zfs] ? taskq_thread_spawn+0x60/0x60 [spl] kthread+0xe8/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 ```

zpool import with zfs_recover/zil_replay_disable

``` NOTE: command hangs here too ~ # echo 1 > /sys/module/zfs/parameters/zfs_recover ~ # echo 1 > /sys/module/zfs/parameters/zil_replay_disable ~ # zpool import -f big-storage WARNING: zfs: adding existent segment to range tree (offset=180fefe000 size=20000) VERIFY3(rs_get_end(rs, rt) >= end) failed (103347642368 >= 103347757056) PANIC at range_tree.c:485:range_tree_remove_impl() Showing stack for process 3145 CPU: 2 PID: 3145 Comm: z_wr_iss Tainted: P O 5.18.19 #1-NixOS Hardware name: ZOTAC XXXXXX/XXXXXX, BIOS 4.6.5 09/15/2015 Call Trace: dump_stack_lvl+0x45/0x5e spl_panic+0xd1/0xe9 [spl] ? zfs_btree_insert_into_leaf+0x232/0x2a0 [zfs] ? __kmalloc_node+0x17b/0x370 ? zfs_btree_insert_into_leaf+0x232/0x2a0 [zfs] ? zfs_btree_find_parent_idx+0x72/0xd0 [zfs] ? pn_free+0x30/0x30 [zfs] ? zfs_btree_find_parent_idx+0x72/0xd0 [zfs] ? zfs_btree_find+0x175/0x300 [zfs] range_tree_remove_impl+0xc6c/0xef0 [zfs] ? zio_wait+0x260/0x290 [zfs] space_map_load_callback+0x22/0x90 [zfs] space_map_iterate+0x193/0x3d0 [zfs] ? spa_stats_destroy+0x190/0x190 [zfs] space_map_load_length+0x5e/0xe0 [zfs] metaslab_load+0x14d/0x8a0 [zfs] ? range_tree_add_impl+0x759/0xe40 [zfs] metaslab_activate+0x50/0x2b0 [zfs] ? preempt_count_add+0x70/0xa0 metaslab_alloc_dva+0x351/0x1490 [zfs] metaslab_alloc+0xd3/0x280 [zfs] zio_dva_allocate+0xd3/0x900 [zfs] ? __kmalloc_node+0x17b/0x370 ? preempt_count_add+0x70/0xa0 ? _raw_spin_lock+0x13/0x40 zio_execute+0x83/0x120 [zfs] taskq_thread+0x2cf/0x500 [spl] ? wake_up_q+0x90/0x90 ? zio_gang_tree_free+0x70/0x70 [zfs] ? taskq_thread_spawn+0x60/0x60 [spl] kthread+0xe8/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 ```

ryao commented 2 years ago

There is a chance this issue is related to 13f2b8fb92c23090b9f6e701c8471aef6b8e917b. That patch will be in the upcoming 2.1.6 release, but it is too late for it to prevent this from happening. :/

CaCTuCaTu4ECKuu commented 2 years ago

I faced similar problem after power loss on Truenas SCALE 22.02.3 System basically hang on zpool import, WebUI works, but tabs content doesnt load and zpool commands hang Importing in readonly mode works

# zdb -vvvvvv -b -e pool0      
loading concrete vdev 0, metaslab 696 of 697 ...rs_get_end(rs, rt) >= end (0x9180903c000 >= 0x91809060000)
ASSERT at ../../module/zfs/range_tree.c:485:range_tree_remove_impl()zsh: abort (core dumped)  zdb -vvvvvv -b -e pool0

I tried tunables as suggested by author here but neither it doesnt work for scale or at all, i'll try this with core as well soon I hope if I manage to import pool in RW mode and leave it metaslab will fix itself as suggested

CaCTuCaTu4ECKuu commented 2 years ago

@ryao is there any insight whether there will be some tool to recover from metaslabs problems? If I understand correctly this stuff can be recreated as long as data from disks can be read, isn't it? When I found out root of problem first thing I looked for is how to reset metaslab

samueldr commented 2 years ago

In the IRC channel, the answer basically amounted to "copy the data to some other storage and re-do the pool".

CaCTuCaTu4ECKuu commented 2 years ago

copy the data to some other storage and re-do the pool

My plan A, im not in a rush so i'll still try recover, but I can imagine someone with a lot of TBs of data, would be troublesome if backup solution isn't pretty good

ryao commented 2 years ago

@ryao is there any insight whether there will be some tool to recover from metaslabs problems? If I understand correctly this stuff can be recreated as long as data from disks can be read, isn't it? When I found out root of problem first thing I looked for is how to reset metaslab

It does not appear to be on the horizon, but it is an interesting idea that should probably be given its own issue for a feature request.

Also, 2.1.6 was released today. If my conjecture about 13f2b8fb92c23090b9f6e701c8471aef6b8e917b fixing the cause is correct, then this would have prevented the problem from happening. Not that I expect that to be much consolation. The easiest solution available right now is to backup the data to another pool, recreate the pool and restore from backup.

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

ProximaNova commented 6 months ago

You wrote "Corrupted single-disk pool cannot be imported anymore." Later you wrote "Importing read-only works." You also wrote "Using 'zdb -bcsvL -e big-storage' (command suggested elsewhere) it spent time verifying everything and everything checked out."

So is it that you meant: "Corrupted single-disk pool cannot be imported anymore. Trying to import read-only and read-write: both don't work"? Or did running 'zdb -bcsvL -e big-storage' make you able to import it read-only?

I have a same or similar problem, and 'sudo zdb -bscvL -e name' says that it will take an estimated 2 months to complete: "1.46G completed ( 1MB/s) estimated time remaining: 1546hr 02min 38sec".

samueldr commented 6 months ago

@ProximaNova

The (single-disk) pool could not be "just" imported as usual (read/write)
It was possible to import read-only
The command (zdb -bcsvL -e big-storage) is unrelated to having it import as read-only. I forget what it does, and why it was suggested to run it.

So, to answer

Or did running 'zdb -bcsvL -e big-storage' make you able to import it read-only?

The command is unrelated to being able to import read-only.

ProximaNova commented 6 months ago

@samueldr Thanks for the reply. Seeing that you could access it at the file level (as opposed to only accessing the HDD as a block device), it makes sense that you wrote

In the IRC channel, the answer basically amounted to "copy the data to some other storage and re-do the pool".

One of the things I didn't get until recently is: how can you recreate it if the metaslab(s) are broken? In issue https://github.com/openzfs/zfs/issues/13995 "Metaslabs recovery tool" @GregorKopka wrote "Metaslabs do the 'free space' accounting for the pool." so that sorta answers that. I read that https://github.com/openzfs/zfs/issues/13995 up to October 31, 2023 (the most recent comment): no mention of any working metaslabs recovery tool.

@CaCTuCaTu4ECKuu wrote

copy the data to some other storage and re-do the pool

My plan A, im not in a rush so i'll still try recover, but I can imagine someone with a lot of TBs of data, would be troublesome if backup solution isn't pretty good

Yes, this is a problem for me. I have only one 18-TB HDD. It was about 9 TB full when I ended up with basically OP's problem. I also don't have another HDD which can hold 9 TB. (I don't recommend using external HDDs which require outlet power and don't solely get their power from USB; if you are writing to the disk and are using a battery-powered computer and have a power outage/lose, then that no-electricity-from-the-outlet can screw things up.)

CaCTuCaTu4ECKuu commented 6 months ago

I'm not sure if I mentioned it but for others to be aware. My problem has to do with RAM, it was either in CPU controller or slot itself. After I check PC components system failed to run altogether with RAM stick in one of slots so it was some weird stuff. Be aware if you have same problem if you start to get this - ensure that system is stable. That's good you at least can access data in RO mode with this kind of problem

samueldr commented 6 months ago

I didn't state it in this issue. RAM, too, was the issue. It's what I implied with a “system that developed a fault”. It's also why I did the copy operation outside of that system.

ProximaNova commented 6 months ago

@ProximaNova

I have a same or similar problem, and 'sudo zdb -bscvL -e name' says that it will take an estimated 2 months to complete: "1.46G completed ( 1MB/s) estimated time remaining: 1546hr 02min 38sec".

Misleading. I ran "sudo zdb -bscvL -e name" from about 2024-04-28 05:42:00 UTC to 2024-05-01 22:32:00 UTC (~4 days). It would take ~months if the speed was always 1 MB/s, but I saw that that process got up to a speed of 28 MB/s. Last I checked it said something like "3 hours remaining, checked 7.70 TB". I left for an hour or two then came back to my computer. I unfortunately didn't see the end result of that command because my computer crashed (but it never said anything like "bad checksum" when it was running).

Why it crashed is basically irrelevant but I'll explain it anyways because it is annoying to run a command for days and not see the conclusion of it: (In QTerminal, I catted out a text file which contained an ipfs:// or ipns:// link, and then I accidentally right clicked on it then clicked "Open Link". I wish that there was a QTerminal setting to remove that option from the context menu. It then opened that link in Brave Browser which turned on Brave's ipfs daemon. That was apparently too much for my computer to handle with everything else that was also running so it froze. I then force-shutdown the computer by holding the power button. Also couldn't ssh in from another computer to try to kill the Brave and lynx processes that were running because vlock and whatever is acting weird after running apt update and apt upgrade in that other computer.)

@CaCTuCaTu4ECKuu

That's good you at least can access data in RO mode with this kind of problem

Agreed. I could also import it RO (I used "sudo zpool import -o readonly=on -f -FX name"). After that, "zpool list" showed this: NAME=name SIZE=16.4T ALLOC=0 ... CAP=0%. That would mean that it is empty, but it actually contains multiple terabytes (seemingly needs RW to show that info). Another helpful thing is as follows. On the day when I first saw this issue from the aforewritten power lose I was able to re-mount it which then showed lots of errors like "degraded" and "suspended" (it isn't a failing HDD from all that I saw and my experience). It also showed this error: "$ ls foldername" -> "ls: reading directory 'foldername': Input/output error". I then got errors like the title of this issue. However, when I imported it as RO just now I was able to read that directory with that same command.

openzfs / zfs