Open antst opened 8 months ago
if I set
(initramfs) echo 1 > /sys/module/zfs/parameters/zil_replay_disable
(initramfs) echo 1 > /sys/module/zfs/parameters/zfs_recover
(initramfs) echo 0 > /sys/module/zfs/parameters/spa_load_verify_data
(initramfs) echo 0 > /sys/module/zfs/parameters/spa_load_verify_metadata
then log is:
(initramfs) zpool import -N -o failmode=continue rpool
(initramfs) [ 538.725503] WARNING: zfs: removing nonexistent segment from range tree (offset=f001540e812000 size=2000)
[ 538.739337] WARNING: zfs: removing nonexistent segment from range tree (offset=1083812200000 size=8f3000)
[ 538.753221] WARNING: zfs: removing nonexistent segment from range tree (offset=4d0015800008000 size=7014000)
[ 538.767354] WARNING: zfs: adding existent segment to range tree (offset=85f15513100000 size=284c000)
[ 538.780809] WARNING: zfs: removing nonexistent segment from range tree (offset=31001de2000a000 size=3101000)
[ 538.794986] WARNING: zfs: removing nonexistent segment from range tree (offset=1464008d0000 size=1000)
[ 538.808593] VERIFY3(rs_get_end(rs, rt) >= end) failed (1460340334592 >= 40146828902903808)
[ 538.821181] PANIC at range_tree.c:499:range_tree_remove_impl()
[ 538.831301] Showing stack for process 2023
[ 538.839641] CPU: 51 PID: 2023 Comm: z_wr_iss Tainted: P OE 6.5.0-18-generic #18~22.04.1-Ubuntu
[ 538.853912] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./TRX40D8-2N2T, BIOS P1.10 08/13/2020
[ 538.868141] Call Trace:
[ 538.874953] <TASK>
[ 538.881361] dump_stack_lvl+0x48/0x70
[ 538.889253] dump_stack+0x10/0x20
[ 538.896666] spl_dumpstack+0x29/0x40 [spl]
[ 538.904909] spl_panic+0xfc/0x120 [spl]
[ 538.912855] ? zfs_btree_find+0x17b/0x270 [zfs]
[ 538.921798] range_tree_remove_impl+0x4b0/0x4f0 [zfs]
[ 538.931187] range_tree_remove+0x10/0x20 [zfs]
[ 538.939640] space_map_load_callback+0x27/0xb0 [zfs]
[ 538.948506] space_map_iterate+0x1bc/0x480 [zfs]
[ 538.956907] ? __pfx_space_map_load_callback+0x10/0x10 [zfs]
[ 538.966267] space_map_load_length+0x7c/0x100 [zfs]
[ 538.974726] metaslab_load_impl+0xcd/0x510 [zfs]
[ 538.982822] ? srso_return_thunk+0x5/0x10
[ 538.990019] ? ktime_get_raw_ts64+0x41/0xd0
[ 538.997367] ? srso_return_thunk+0x5/0x10
[ 539.004556] ? srso_return_thunk+0x5/0x10
[ 539.011709] ? gethrtime+0x30/0x60 [zfs]
[ 539.018962] ? srso_return_thunk+0x5/0x10
[ 539.026130] ? arc_all_memory+0xe/0x20 [zfs]
[ 539.033720] ? srso_return_thunk+0x5/0x10
[ 539.040864] metaslab_load+0x72/0xe0 [zfs]
[ 539.048249] metaslab_activate+0x50/0x110 [zfs]
[ 539.056051] ? srso_return_thunk+0x5/0x10
[ 539.063107] metaslab_group_alloc_normal+0x318/0x4f0 [zfs]
[ 539.071888] metaslab_group_alloc+0x25/0xb0 [zfs]
[ 539.079823] metaslab_alloc_dva+0x28f/0x590 [zfs]
[ 539.087768] metaslab_alloc+0xc8/0x200 [zfs]
[ 539.095298] zio_dva_allocate+0xb2/0x390 [zfs]
[ 539.102985] ? tsd_get+0x30/0x60 [spl]
[ 539.109802] ? srso_return_thunk+0x5/0x10
[ 539.116872] zio_execute+0x92/0xf0 [zfs]
[ 539.123972] taskq_thread+0x1f6/0x3c0 [spl]
[ 539.131154] ? __pfx_default_wake_function+0x10/0x10
[ 539.139120] ? __pfx_zio_execute+0x10/0x10 [zfs]
[ 539.146815] ? __pfx_taskq_thread+0x10/0x10 [spl]
[ 539.154366] kthread+0xf2/0x120
[ 539.160332] ? __pfx_kthread+0x10/0x10
[ 539.166911] ret_from_fork+0x47/0x70
[ 539.173249] ? __pfx_kthread+0x10/0x10
[ 539.179691] ret_from_fork_asm+0x1b/0x30
[ 539.186547] </TASK>
I also tried
zdb -AAA -b rpool
it results in infinite log :)
But
(initramfs) zdb -A -b rpool
Traversing all blocks to verify nothing leaked ...
loading concrete vdev 0, metaslab 85 of 116 ...entry_offset < sm->sm_start + sm->sm_size (0xf001540e812000 < 0x15800000000)
ASSERT at ../../module/zfs/space_map.c:173:space_map_iterate()entry_offset + entry_run <= sm->sm_start + sm->sm_size (0xf001540e814000 <= 0x15800000000)
ASSERT at ../../module/zfs/space_map.c:175:space_map_iterate()start <= UINT32_MAX (0xf000000e812 <= 0xffffffff)
ASSERT at ../../include/sys/range_tree.h:199:rs_set_start_raw()end <= UINT32_MAX (0xf000000e814 <= 0xffffffff)
ASSERT at ../../include/sys/range_tree.h:219:rs_set_end_raw()error: zfs: adding existent segment to range tree (offset=f001540e812000 size=2000)
Aborted
In general, I would suggest upgrading to 2.2.2 - not for this, I don't recall anything fixed since 2.2.0 that might be germane immediately, but there are a number of fixes that have gone in since then that are somewhat valuable.
failmode=continue
doesn't do anything useful here, that's more for disks that vanish underneath, and only in a very specific case, it's pretty unsafe otherwise, IIRC.
The error message basically means it's trying to remove an element that's not there, which shouldn't happen, ever, really. Conceivably you could patch out the panic on that happening and make it just throw out the invalid remove request, but what you might get if you did that is pretty undefined, since by definition this should never happen, so it happening means something already went wrong and we don't know what it was.
You could also try importing at older txgs readonly and seeing if they behave less badly - specifying specific txgs with zdb would probably be faster to iterate on than trying the import outside of zdb and seeing if it panics, but if your userland is 2.1.x, zdb might get upset if you used any 2.2 features on the pool.
failmode=continue
doesn't do anything useful here, that's more for disks that vanish underneath, and only in a very specific case, it's pretty unsafe otherwise, IIRC.
I just experimented/ I made copies of logs on last attempts, and those were with failmode. And yes, I know it is for different case. Unfortunately it takes ages to reboot after every attempt, so I left as is :)
13963 and #13483 seem germane. Most of the cases of this I see seem to be on systems where people aren't necessarily running ECC RAM, so it might have been something bitflipping exciting, and it's hard to know for certain. That said, I believe your motherboard can take ECC or non-ECC RAM, so it's probably worth asking if you have that, to eliminate one possible cause if you do.
My memory is ECC, granted.
The error message basically means it's trying to remove an element that's not there, which shouldn't happen, ever, really. Conceivably you could patch out the panic on that happening and make it just throw out the invalid remove request, but what you might get if you did that is pretty undefined, since by definition this should never happen, so it happening means something already went wrong and we don't know what it was.
Patching, if that would let me make snapshot and copy to another pool - would be sufficient :)
You could also try importing at older txgs readonly and seeing if they behave less badly - specifying specific txgs with zdb would probably be faster to iterate on than trying the import outside of zdb and seeing if it panics, but if your userland is 2.1.x, zdb might get upset if you used any 2.2 features on the pool.
I can push userland to 2.2. Readonly is no help here, as it imports without any issues in RO.
Should I just use -t
option with zdb, or there is more to that?
If it imports readonly with 0 issues, that's useful enough to let you get data out, since you can send from read-only datasets without a snapshot (though you can't, like, resume or use a number of flags like -R to do it).
zdb -t [some txg]
with an older txg from zdb -lu would be an experiment, once you're sure that it fails right now without a txg specified.
zdb
also grew -B
I think it is in 2.2, which lets you emit a send stream from zdb from just specifying a dataset, for a pool where it wouldn't import enough to do the dance I just suggested you do. (At least, I think that made it into 2.2, it's in git...)
If it imports readonly with 0 issues, that's useful enough to let you get data out, since you can send from read-only datasets without a snapshot (though you can't, like, resume or use a number of flags like -R to do it).
...but would be hell of a work, as there are a lot datasets(snap, lxd, docker) and some of them clones of another.
zdb -t [some txg]
with an older txg from zdb -lu would be an experiment, once you're sure that it fails right now without a txg specified.
zdb
also grew-B
I think it is in 2.2, which lets you emit a send stream from zdb from just specifying a dataset, for a pool where it wouldn't import enough to do the dance I just suggested you do. (At least, I think that made it into 2.2, it's in git...)
Thanks for the tips! :)
hm. zdb with all txgs, belonging to present uberblocks, ends up with the same issue
(initramfs) zdb -e -b -t 91793915 rpool
Traversing all blocks to verify nothing leaked ...
loading concrete vdev 0, metaslab 85 of 116 ...entry_offset < sm->sm_start + sm->sm_size (0xf001540e812000 < 0x15800000000)
ASSERT at ../../module/zfs/space_map.c:173:space_map_iterate()Aborted
assuming I am going to send datasets from read-only mounted pool or via zdb -B
, then what should be my strategy to properly backup datasets with snapshots?
for example:
rpool/lxd/images/97b9236df59497b28eebeb91eee7a2bd815e428613e49e478837ffa401d39da0
rpool/lxd/images/97b9236df59497b28eebeb91eee7a2bd815e428613e49e478837ffa401d39da0@readonly
rpool/lxd/images/dc0665d2cbf69531370268c87fc707bce37cbab22298d4399a8029f65751f8aa
rpool/lxd/images/dc0665d2cbf69531370268c87fc707bce37cbab22298d4399a8029f65751f8aa@readonly
and, even more difficult case, datasets which are clones.
or, maybe, after all, there is a way to manually alter metaslab(s) and make pool good enough, to mount RW, make recursive snapshots and generate replication stream?
I'm not aware of anyone having written one.
You could do receive with -o checksum=[something stronger than fletcher4] and then use -o origin= to re-establish clone relationships afterward, and it'll apply nopwrite liberally to save space.
Note that the first part is not optional, or it will just take up the full amount of space and be marked as a clone.
hm. I am working on the script which managed backing up of such read-only pool. But have a problem. Assuming I have multiple snapshots on the dataset, for example
NAME CREATION USED REFER
rpool/ROOT/ubuntu_6ishm7/var/snap Thu Mar 11 22:57 2021 3.56G 1.78G
rpool/ROOT/ubuntu_6ishm7/var/snap@autozsys_67t44x Mon Jun 5 16:18 2023 96K 1.79G
rpool/ROOT/ubuntu_6ishm7/var/snap@autozsys_ltgnni Mon Jun 5 16:25 2023 96K 1.79G
rpool/ROOT/ubuntu_6ishm7/var/snap@autozsys_cys0gu Wed Jun 7 4:30 2023 442M 1.79G
rpool/ROOT/ubuntu_6ishm7/var/snap@autozsys_mjohdz Thu Jun 8 4:29 2023 5.44M 1.79G
rpool/ROOT/ubuntu_6ishm7/var/snap@autozsys_vliktr Tue Jun 13 4:58 2023 5.54M 1.79G
rpool/ROOT/ubuntu_6ishm7/var/snap@autozsys_zm0e2o Fri Jun 16 4:02 2023 5.19M 1.79G
rpool/ROOT/ubuntu_6ishm7/var/snap@autozsys_dnnhut Sat Jun 17 4:19 2023 5.19M 1.79G
rpool/ROOT/ubuntu_6ishm7/var/snap@autozsys_7s62vg Tue Jun 20 4:32 2023 5.35M 1.79G
rpool/ROOT/ubuntu_6ishm7/var/snap@autozsys_0wu8sf Fri Jun 23 4:19 2023 5.36M 1.79G
rpool/ROOT/ubuntu_6ishm7/var/snap@autozsys_0ce0q2 Fri Jun 30 4:17 2023 6.13M 1.79G
rpool/ROOT/ubuntu_6ishm7/var/snap@autozsys_cvwdyu Sat Feb 17 23:11 2024 7.40M 1.78G
I either can send all existing snapshots up to the last one or head of dataset without any snapshots. But can't find way to combine them. Sending snapshots and then sending head with -o origin=xxx
does not help.
Anything I am missing here?
What do you mean "does not help"?
I think setting the checksum to a nopwrite-capable one, setting origin=, then promoting the clone should do what you'd like here, if cumbersomely.
On Tue, Feb 20, 2024, 9:31 AM Anton Starikov @.***> wrote:
hm. I am working on the script which managed backing up of such read-only pool. But have a problem. Assuming I have multiple snapshots on the dataset, for example
NAME CREATION USED REFER rpool/ROOT/ubuntu_6ishm7/var/snap Thu Mar 11 22:57 2021 3.56G 1.78G @._67t44x Mon Jun 5 16:18 2023 96K 1.79G @._ltgnni Mon Jun 5 16:25 2023 96K 1.79G @._cys0gu Wed Jun 7 4:30 2023 442M 1.79G @._mjohdz Thu Jun 8 4:29 2023 5.44M 1.79G @._vliktr Tue Jun 13 4:58 2023 5.54M 1.79G @._zm0e2o Fri Jun 16 4:02 2023 5.19M 1.79G @._dnnhut Sat Jun 17 4:19 2023 5.19M 1.79G @._7s62vg Tue Jun 20 4:32 2023 5.35M 1.79G @._0wu8sf Fri Jun 23 4:19 2023 5.36M 1.79G @._0ce0q2 Fri Jun 30 4:17 2023 6.13M 1.79G @.***_cvwdyu Sat Feb 17 23:11 2024 7.40M 1.78G
I either can send all existing snapshots up to the last one or head of dataset without any snapshots. But can't find way to combine them. Sending snapshots and then sending head with -o origin=xxx does not help. Anything I am missing here?
— Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/15915#issuecomment-1954339906, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7NCESWHTQESEGJZGYTYUSXUXAVCNFSM6AAAAABDQDZAICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJUGMZTSOJQGY . You are receiving this because you commented.Message ID: @.***>
After all, looks like I managed to backup it all in more or less consistent way. Wasted whole day though )
Hi! I have similar error. But my system doesn't freeze. Pools are available for reading and writing. Everything is working well, except for errors in logs.
Type Version/Name
Distribution Name Ubuntu
Distribution Version 22.04.4
Kernel Version 6.5.0-25-generic
Architecture x86_64
OpenZFS Version 2.2.0 (kernel) 2.1.5 (utils)
I see error at startup, but OS is loading:
Begin: Importing ZFS root pool 'rpool' ... Begin: Importing pool 'rpool' using defaults ... [ 4.006236] VERIFY3(rs_get_end(rs, rt) >= end) failed (8053108736 >= 8071139328)
PANIC at range_tree.c:499:range_tree_remove_impl()
When OS is running:
root@Bastik:~# zpool status -v
pool: bpool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
bpool ONLINE 0 0 0
74ae56db-aa4b-5f46-8fe4-a176a5d756dd ONLINE 0 0 0
errors: No known data errors
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 00:03:23 with 0 errors on Fri Mar 15 13:05:31 2024
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
b2ac8a6e-2f4d-e240-85ce-3712884b3019 ONLINE 0 0 0
errors: No known data errors
root@Bastik:~# zdb
bpool:
version: 5000
name: 'bpool'
state: 0
txg: 342657
pool_guid: 14285551797236171980
errata: 0
hostid: 1372433909
hostname: 'Bastik-UTech'
com.delphix:has_per_vdev_zaps
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 14285551797236171980
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 10628172824458441428
path: '/dev/disk/by-partuuid/74ae56db-aa4b-5f46-8fe4-a176a5d756dd'
whole_disk: 0
metaslab_array: 256
metaslab_shift: 27
ashift: 12
asize: 2142765056
is_log: 0
create_txg: 4
com.delphix:vdev_zap_leaf: 129
com.delphix:vdev_zap_top: 130
features_for_read:
com.delphix:embedded_data
com.delphix:hole_birth
rpool:
version: 5000
name: 'rpool'
state: 0
txg: 5752086
pool_guid: 8373325186333015073
errata: 0
hostid: 1372433909
hostname: 'Bastik-UTech'
com.delphix:has_per_vdev_zaps
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 8373325186333015073
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 2971497501110945703
path: '/dev/disk/by-partuuid/b2ac8a6e-2f4d-e240-85ce-3712884b3019'
whole_disk: 0
metaslab_array: 69
metaslab_shift: 29
ashift: 12
asize: 156223406080
is_log: 0
DTL: 841
create_txg: 4
com.delphix:vdev_zap_leaf: 67
com.delphix:vdev_zap_top: 68
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
journalctl
kernel: INFO: task z_metaslab:597 blocked for more than 362 seconds.
kernel: Tainted: P OE 6.5.0-25-generic #25~22.04.1-Ubuntu
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:z_metaslab state:D stack:0 pid:597 ppid:2 flags:0x00004000
kernel: Call Trace:
kernel: <TASK>
kernel: __schedule+0x2cb/0x750
kernel: schedule+0x63/0x110
kernel: cv_wait_common+0x102/0x140 [spl]
kernel: ? __pfx_autoremove_wake_function+0x10/0x10
kernel: __cv_wait+0x15/0x30 [spl]
kernel: metaslab_load_wait+0x28/0x50 [zfs]
kernel: metaslab_load+0x17/0xe0 [zfs]
kernel: metaslab_preload+0x48/0xa0 [zfs]
kernel: taskq_thread+0x1f6/0x3c0 [spl]
kernel: ? __pfx_default_wake_function+0x10/0x10
kernel: ? __pfx_taskq_thread+0x10/0x10 [spl]
kernel: kthread+0xf2/0x120
kernel: ? __pfx_kthread+0x10/0x10
kernel: ret_from_fork+0x47/0x70
kernel: ? __pfx_kthread+0x10/0x10
kernel: ret_from_fork_asm+0x1b/0x30
kernel: </TASK>
You should probably file a bug with Ubuntu, since that's something that I believe is long-since fixed here.
Could be wrong, but since they shipped a version with a lot of known issues, I'd suggest you try to reproduce with 2.2.3 first.
(Of course, I suspect at the point where your pool is trashed now, it'll fail the same way on 2.2.3 on import, and not tell us anything about the underlying bug that made it write something broken in the first place, but I could be wrong.
I restored my VM from snapshot, at a time when there was no error. And I noticed that the error appears after automatic update of Snap Apps (canonical-livepatch, snapstore, snapd etc...).
Then I restored my virtual machine again, before this error appeared. I immediately turned off automatic Snap Apps updates.
snap refresh --hold
And the error doesn't appear anymore. I assume that the error is in Snap Apps. Since all updates via APT package manager (zfs-initramfs, zfs-zed, linux-hwe, linux-headers etc...) are installed and work without problems.
Problem with these Snap apps:
I restored my VM from snapshot, at a time when there was no error. Deleted these apps and problem is gone. However, I had to completely abandon SNAP. And I also lost Canonical Livepatch :(
System information
Describe the problem you're observing
On pool import I get a panic , then systems hangs. After initial panic message, systems keeps hanging, but produce regular message about blocked task. As it is root pool,I rather limited in options. But I tried: 1) import as read-only, it works. 2) set
zil_replay_disable=1
andzfs_recover=1
, it does not help.HW details: Pool is on single NVME (Samsung 970 Pro). Smartctl shows good health of the NVME. System is with ECC memory. I can guess that because I had couple of hard resets (due to some other issue), I got some corruption. But I would love to find way to fix pool.
Describe how to reproduce the problem
Include any warning/errors/backtraces from the system logs
console log: