Open skull-squadron opened 2 years ago
Appears related to #8234 but no solution is indicated for repair.
(mounted RO)
pool: tank
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 10:32:30 with 0 errors on Wed Aug 25 03:10:56 2021
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
zfs-e7334676685ebfb1 ONLINE 0 0 0
zfs-2cdde638274363a0 ONLINE 0 0 0
zfs-0bcbaa5f63217fe4 ONLINE 0 0 0
zfs-d4c0e7fd00f938a4 ONLINE 0 0 0
errors: No known data errors
hostname:
and each child's path:
should be overwritten on -f
yes?
root@zfs:~# zpool import
pool: tank
id: 13413710004025204341
state: ONLINE
status: Some supported features are not enabled on the pool.
(Note that they may be intentionally disabled if the
'compatibility' property is set.)
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
config:
tank ONLINE
raidz1-0 ONLINE
ata-WDC_WUH721414ALE6L4_9JGBRDWT ONLINE
ata-WDC_WUH721414ALE6L4_9JGBW7YT ONLINE
ata-WDC_WUH721414ALE6L4_9JGBUB7T ONLINE
ata-WDC_WUH721414ALE6L4_9JGB387T ONLINE
root@zfs:~# zdb -l /dev/disk/by-label/tank
------------------------------------
LABEL 0
------------------------------------
version: 5000
name: 'tank'
state: 0
txg: 5124276
pool_guid: 13413710004025204341
errata: 0
hostname: 'fedora'
top_guid: 17742213968895681915
guid: 7683140127223500668
hole_array[0]: 1
vdev_children: 2
vdev_tree:
type: 'raidz'
id: 0
guid: 17742213968895681915
nparity: 1
metaslab_array: 134
metaslab_shift: 34
ashift: 12
asize: 56002017755136
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 4943082563016739590
path: '/dev/xvdi1'
devid: 'ata-WDC_WUH721414ALE6L4_9JGBRDWT-part1'
phys_path: 'pci-0000:47:00.0-ata-5'
whole_disk: 1
DTL: 17787
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 15336595513193204854
path: '/dev/xvdj1'
devid: 'ata-WDC_WUH721414ALE6L4_9JGBW7YT-part1'
phys_path: 'pci-0000:47:00.0-ata-6'
whole_disk: 1
DTL: 17786
create_txg: 4
children[2]:
type: 'disk'
id: 2
guid: 6040381479544144918
path: '/dev/xvdk1'
devid: 'ata-WDC_WUH721414ALE6L4_9JGBUB7T-part1'
phys_path: 'pci-0000:47:00.0-ata-7'
whole_disk: 1
DTL: 17785
create_txg: 4
children[3]:
type: 'disk'
id: 3
guid: 7683140127223500668
path: '/dev/xvdl1'
devid: 'ata-WDC_WUH721414ALE6L4_9JGB387T-part1'
phys_path: 'pci-0000:47:00.0-ata-8'
whole_disk: 1
DTL: 17784
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
labels = 0 1 2 3
zpool status
hangs after import -f
. Seems like a zfs kernel task is deadlocked. ):
For grins, tried importing on current stable TrueNAS (which shares much of the ZoL codebase now in FreeBSD IIRC), similar error message:
Solaris: WARNING: Pool 'tank' has encountered an uncorrectable I/O failure and has been suspended.
zpool status
also hangs on TrueNAS/FreeBSD.
Methinks the intended behavior is it should never hang regardless of the condition of any individual disk(s) or pool. This leads to system hard reboot time. ):
There are so many moving parts here it's hard to guess what might be going awry.
If you run zpool events -vf
before trying the import r/w, does it print anything interesting?
Does /proc/spl/kstat/zfs/dbgmsg have anything interesting to say?
By default, the failure mode on a pool being suspended is "wait forever until it comes back", rather than returning errors when things try to access it. I suspect you're encountering a consequence of that.
Without knowing anything about why it's complaining, it's hard to speculate what might be a good or terrible idea to help it.
And no, it's perfectly plausible for it to have an uncorrectable error in importing RW and not RO - as the term implies, importing read/write involves writing to the pool, and if it's having problems doing that for some reason, [...].
How, exactly, are the disks connected to the host that can't import it r/w? It sounds like raw device passthrough on a VMware host to a guest?
Could you share the entirety of dmesg from either the Ubuntu VM or the FreeBSD VM on one of these times you tried the import?
I've encountered the same issue: import tank
hangs while import -o readonly=on tank
works.
Before this happened, I was trying to benchmark using fio
. It hangs in creating test file (iostat
shows lots of write on hard drives and fio
can't be killed by kill -SIGKILL
). I hard reset the system, and then, it hangs at ZFS import.
I can share my logs. If I run zpool events -vf
and then import tank
in another terminal, nothing shows in zpool events -vf
. iostat
shows that zfs is keep reading from the hard drives.
/proc/spl/kstat/zfs/dbgmsg:
timestamp message
1644170557 spa.c:6232:spa_tryimport(): spa_tryimport: importing tank
1644170557 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADING
1644170557 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/zfs_tank_special_0_1': best uberblock found for spa $import. txg 1412295
1644170557 spa_misc.c:418:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=1412295
1644170557 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1644170557 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1644170557 spa.c:8337:spa_async_request(): spa=$import async request task=2048
1644170557 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED
1644170557 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING
1644170569 spa.c:6232:spa_tryimport(): spa_tryimport: importing tank
1644170569 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADING
1644170569 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/zfs_tank_special_0_1': best uberblock found for spa $import. txg 1412295
1644170569 spa_misc.c:418:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=1412295
1644170569 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1644170569 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1644170569 spa.c:8337:spa_async_request(): spa=$import async request task=2048
1644170569 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED
1644170569 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING
1644170573 spa.c:6232:spa_tryimport(): spa_tryimport: importing tank
1644170573 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADING
1644170573 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/zfs_tank_special_0_1': best uberblock found for spa $import. txg 1412295
1644170573 spa_misc.c:418:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=1412295
1644170573 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1644170573 spa.c:8337:spa_async_request(): spa=$import async request task=4096
1644170574 spa.c:8337:spa_async_request(): spa=$import async request task=2048
1644170574 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED
1644170574 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING
1644170574 spa.c:6084:spa_import(): spa_import: importing tank
1644170574 spa_misc.c:418:spa_load_note(): spa_load(tank, config trusted): LOADING
1644170574 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/zfs_tank_special_0_1': best uberblock found for spa tank. txg 1412295
1644170574 spa_misc.c:418:spa_load_note(): spa_load(tank, config untrusted): using uberblock with txg=1412295
1644170574 spa.c:8337:spa_async_request(): spa=tank async request task=4096
1644170574 spa.c:8337:spa_async_request(): spa=tank async request task=4096
1644170574 spa_misc.c:418:spa_load_note(): spa_load(tank, config trusted): read 534 log space maps (542 total blocks - blksz = 131072 bytes) in 109 ms
1644170574 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa tank, vdev_id 1, ms_id 23, smp_length 644536, unflushed_allocs 455041024, unflushed_frees 33660928, freed 0, defer 0 + 0, unloaded time 202589 ms, loading_time 24 ms, ms_max_size 2147483648, max size error 2145386496, old_weight 7c0000000000001, new_weight 7c0000000000001
1644170574 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa tank, vdev_id 0, ms_id 64, smp_length 24088, unflushed_allocs 508084224, unflushed_frees 24576, freed 0, defer 0 + 0, unloaded time 202656 ms, loading_time 0 ms, ms_max_size 7415701504, max size error 7415685120, old_weight 800000000000001, new_weight 800000000000001
1644170579 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa tank, vdev_id 5, ms_id 0, smp_length 11576, unflushed_allocs 172032, unflushed_frees 147456, freed 0, defer 0 + 0, unloaded time 207332 ms, loading_time 6 ms, ms_max_size 497016832, max size error 496885760, old_weight 700000000000001, new_weight 700000000000001
1644170586 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa tank, vdev_id 1, ms_id 64, smp_length 41376, unflushed_allocs 117465088, unflushed_frees 432889856, freed 0, defer 0 + 0, unloaded time 214437 ms, loading_time 8 ms, ms_max_size 4557053952, max size error 4555571200, old_weight 800000000000001, new_weight 800000000000001
1644170634 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa tank, vdev_id 0, ms_id 31, smp_length 113640, unflushed_allocs 113827840, unflushed_frees 55058432, freed 0, defer 0 + 0, unloaded time 262514 ms, loading_time 9 ms, ms_max_size 7593730048, max size error 7592673280, old_weight 800000000000001, new_weight 800000000000001
1644170658 metaslab.c:2438:metaslab_load_impl(): metaslab_load: txg 0, spa tank, vdev_id 1, ms_id 24, smp_length 514872, unflushed_allocs 0, unflushed_frees 73728, freed 0, defer 0 + 0, unloaded time 286432 ms, loading_time 25 ms, ms_max_size 2147672064, max size error 2147598336, old_weight 7c0000000000001, new_weight 7c0000000000001
iostat -h 3
when zfs import tank
hangs:
avg-cpu: %user %nice %system %iowait %steal %idle
0.0% 0.0% 0.5% 4.8% 0.0% 94.7%
tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd Device
0.00 0.0k 0.0k 0.0k 0.0k 0.0k 0.0k nvme0n1
0.00 0.0k 0.0k 0.0k 0.0k 0.0k 0.0k nvme1n1
74.67 74.7M 0.0k 0.0k 224.0M 0.0k 0.0k sda
77.67 77.7M 0.0k 0.0k 233.0M 0.0k 0.0k sdb
74.67 74.7M 0.0k 0.0k 224.0M 0.0k 0.0k sdc
77.67 77.7M 0.0k 0.0k 233.0M 0.0k 0.0k sdd
I can't see any errors related to ZFS in dmesg
.
System information:
Linux version 5.10.0-11-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.92-1 (2022-01-18)
ZFS: Loaded module v2.1.2-1~bpo11+1, ZFS pool version 5000, ZFS filesystem version 5
I think I encountered the same issue (able to import readonly
, r/w import hangs, no IO after a few seconds, no evidence of errors found anywhere) after I had to reset my computer because a docker image build froze it. I was also rather low on disk space (10G free on 1.8TB) at the time.
After running
echo 1 > /sys/module/zfs/parameters/zfs_recover
zdb -e -bcsvL tank
I was able to import the pool r/w again. zdb
also didn't report anything that seemed indicative of any problems.
I got the idea from https://www.reddit.com/r/zfs/comments/fcacws/using_zdb_to_repair_errors/
As someone said in that reddit thread, zdb should not be writing anything, and if it does, that's a bug, so it's just the zfs_recover setting that's affected that, I would expect.
On Wed, Jan 11, 2023 at 3:06 AM Mathis @.***> wrote:
I think I encountered the same issue (able to import readonly, r/w import hangs, no IO after a few seconds, no evidence of errors found anywhere) after I had to reset my computer because a docker image build froze it. I was also rather low on disk space (10G free on 1.8TB) at the time.
After running
echo 1 > /sys/module/zfs/parameters/zfs_recover zdb -e -bcsvL tank
I was able to import the pool r/w again. zdb also didn't report anything that seemed indicative of any problems.
I got the idea from https://www.reddit.com/r/zfs/comments/fcacws/using_zdb_to_repair_errors/
— Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/12980#issuecomment-1378371374, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7IKC2ENMPYDQMOCC5TWRZSZJANCNFSM5MBQQ46Q . You are receiving this because you commented.Message ID: @.***>
Ok. I didn't try to importing after only enabling zfs_recover
. In the hopefully unlikely event that I encounter the problem again I will try that.
Hello! I'm experiencing the same thing as @ybh1998 . After importing my pool as readonly, I see on zpool status -v
that a stuck scrub is in-progress. How can I cancel this? I thought this wasn't possible/scrubs are supposed to be automatically cancelled. zpool events
also shows that there's an I/O error.
Another thing to note is that the pool is healthy/online but there is a corrupted metadata file <0x40>. Is there a way to also resolve this? This doesn't seem to be related to the I/O error, but for context, I was running a scrub before to see if it could auto-correct the issue and then ran a second scrub where it encountered the I/O suspension.
I'm kind of new to this, so any tips on troubleshooting?
@zehro You can not cancel or pause started scrub while the pool is imported read-only, since it require metadata write, which is impossible. Scrub can not progress in that case, but you can't do anything about it either. I've tried to look for some magic module parameter that would cancel scrub during import without restarting it, but haven't found anything quickly, but may be I am looked in wrong places.
If before starting scrub you've already got pool suspension, and it is not caused by some hardware problem you can fix by some external means, like power cycle or fixing some fallen off disk, then scrub couldn't do anything about it from the beginning. Scrub is not a magic, it just verifies all checksums and tries to recover data copies that are still recoverable. But exactly the same recovery is done during normal read process, and if it can't do it, and the error is not transient, then there is nothing to be done. You should probably import the pool in read-only mode and evacuate the data.
@amotin I appreciate the response. I suspected as much, and am prepping to evac the data and recreate the pool. I'm trying out a few other potential solutions before taking the time (30+TB of data is a while), and I am still open for suggestions.
System information
Describe the problem you're observing
Imported using
-d /dev/disk/by-partlabel/zfs-*
(all 4 data drives)4 drive dataset refuses to import RW. RO works fine.
It had a ZIL which is an Intel SSD. (Available as /dev/sdb (sdb1, sdb9))
It had an L2ARC which was another drive, but I removed it because benchmarks proved there was no real performance gain w/ 512 GiB of RAM.
Describe how to reproduce the problem
Come over and slot-in all 4 drives and the SSD. :)
Include any warning/errors/backtraces from the system logs
(No error messages observed from utils.)
Note: First line doesn't make sense because user data is readable RO. All the drives should be fine mechanically and electrically because they were mounted under a Qubes (Xen) VM before I switched to VMware ESXi and added them to an Ubuntu using RDMs.