openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.62k stars 1.75k forks source link

zfs destroy can't destroy zvol that are detected with permanent errors #12563

Closed gitercn closed 1 year ago

gitercn commented 3 years ago

System information

Type Version/Name
Distribution Name ProxmoxVE
Distribution Version 6.4-13
Kernel Version 5.4.128-1-pve
Architecture amd64
OpenZFS Version zfs-kmod-2.0.5-pve1~bpo10+1

Describe the problem you're observing

I had some issues with bad cables that caused my zpool can't import some days ago. Then I changed these cables and used zpool import -f -F -T <tgx> poolname to revert it to a previous state (the steps are: https://github.com/openzfs/zfs/issues/6497#issuecomment-917718107). Now there are still some permanent errors with some zvols when I check with zpool status -v. I want to destroy these zvols to clean these errors, but I can't destroy them (except one of them), it shows cannot open 'xxx': I/O error. How to get rid of them?

root@pve52:~# zfs mount -a
cannot iterate filesystems: I/O error
root@pve52:~# zpool status -v
  pool: zfs52
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 8K in 1 days 00:48:01 with 8 errors on Mon Sep 13 02:48:13 2021
config:

        NAME                                    STATE     READ WRITE CKSUM
        zfs52                                   DEGRADED     0     0     0
          raidz1-0                              DEGRADED     0     0     0
            ata-WDC_WD120EMFZ-11A6JA0_QGG3AB2T  DEGRADED     0     0    32  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_QGGDS91T  DEGRADED     0     0    22  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_QGGE07HT  DEGRADED     0     0    27  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_QGGL1ZVT  DEGRADED     0     0    28  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_QGH5V04T  DEGRADED     0     0    22  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_X1G502KL  DEGRADED     0     0    22  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_X1G6H9LL  DEGRADED     0     0    16  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_X1G9TYHL  DEGRADED     0     0    19  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_XHG0J1MD  DEGRADED     0     0    28  too many errors

errors: Permanent errors have been detected in the following files:

        zfs52/enc/dir:<0xf0583>
        zfs52/enc/vol/vm-115-disk-0:<0x0>
        zfs52/enc/vol/vm-112-disk-0:<0x0>
        zfs52/enc/vol/subvol-103-disk-0:<0x0>
        zfs52/enc/vol/vm-151-disk-0:<0x0>
        zfs52/enc/vol/vm-151-disk-2:<0x0>
        zfs52/enc/vol/vm-113-disk-0:<0x0>
        zfs52/enc/vol/vm-116-disk-0:<0x0>
root@pve52:~# zfs destroy zfs52/enc/vol/vm-115-disk-0
cannot open 'zfs52/enc/vol/vm-115-disk-0': I/O error
root@pve52:~# zfs destroy zfs52/enc/vol/vm-112-disk-0
cannot open 'zfs52/enc/vol/vm-112-disk-0': I/O error
root@pve52:~# zfs destroy zfs52/enc/vol/subvol-103-disk-0    //<---- note: this is the only one that can be destroyed
root@pve52:~# zfs destroy zfs52/enc/vol/vm-151-disk-0
cannot open 'zfs52/enc/vol/vm-151-disk-0': I/O error
root@pve52:~# zfs destroy zfs52/enc/vol/vm-151-disk-2
cannot open 'zfs52/enc/vol/vm-151-disk-2': I/O error
root@pve52:~# zfs destroy zfs52/enc/vol/vm-113-disk-0
cannot open 'zfs52/enc/vol/vm-113-disk-0': I/O error
root@pve52:~# zfs destroy zfs52/enc/vol/vm-116-disk-0
cannot open 'zfs52/enc/vol/vm-116-disk-0': I/O error

This is what it looks like after destroyed zfs52/enc/vol/subvol-103-disk-0. All others are still there...

root@pve52:~# zpool status -v
  pool: zfs52
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 8K in 1 days 00:48:01 with 8 errors on Mon Sep 13 02:48:13 2021
config:

        NAME                                    STATE     READ WRITE CKSUM
        zfs52                                   DEGRADED     0     0     0
          raidz1-0                              DEGRADED     0     0     0
            ata-WDC_WD120EMFZ-11A6JA0_QGG3AB2T  DEGRADED     0     0    36  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_QGGDS91T  DEGRADED     0     0    24  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_QGGE07HT  DEGRADED     0     0    32  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_QGGL1ZVT  DEGRADED     0     0    34  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_QGH5V04T  DEGRADED     0     0    26  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_X1G502KL  DEGRADED     0     0    26  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_X1G6H9LL  DEGRADED     0     0    18  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_X1G9TYHL  DEGRADED     0     0    22  too many errors
            ata-WDC_WD120EMFZ-11A6JA0_XHG0J1MD  DEGRADED     0     0    32  too many errors

errors: Permanent errors have been detected in the following files:

        zfs52/enc/dir:<0xf0583>
        zfs52/enc/vol/vm-115-disk-0:<0x0>
        zfs52/enc/vol/vm-112-disk-0:<0x0>
        <0x4d>:<0x0>
        zfs52/enc/vol/vm-151-disk-0:<0x0>
        zfs52/enc/vol/vm-151-disk-2:<0x0>
        zfs52/enc/vol/vm-113-disk-0:<0x0>
        zfs52/enc/vol/vm-116-disk-0:<0x0>
        <0xffffffffffffffff>:<0x0>

Describe how to reproduce the problem

Not sure how to reproduce because it's initially caused by bad cables.

Include any warning/errors/backtraces from the system logs

dmesg log appears to be normal. The only warning are these (unknown if it's related with zfs or not)

[   31.518469] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
[   35.987493] WARNING: can't open objset 2068, error 5
[   36.002540] WARNING: can't open objset 1720, error 5
[   36.003232] WARNING: can't open objset 269, error 5
[   36.032697] WARNING: can't open objset 2949, error 5
[   36.037329] WARNING: can't open objset 612, error 5
[   36.046358] WARNING: can't open objset 626, error 5
[   36.203316] WARNING: can't open objset for 269, error 5
[   36.203424] WARNING: can't open objset for 626, error 5
[   36.203732] WARNING: can't open objset for 2068, error 5
[   36.203780] WARNING: can't open objset for 2949, error 5
[   36.203905] WARNING: can't open objset for 612, error 5
[   36.203943] WARNING: can't open objset for 1720, error 5
[   44.920979] mei_me 0000:00:16.0: timer: init clients timeout hbm_state = 2.
[   44.921010] mei_me 0000:00:16.0: unexpected reset: dev_state = INIT_CLIENTS fw status = 001F0252 348A0E26 00000000 00084000 00000000 00000000
[   74.809359] mei_me 0000:00:16.0: timer: init clients timeout hbm_state = 2.
[   74.809395] mei_me 0000:00:16.0: unexpected reset: dev_state = INIT_CLIENTS fw status = 001F0252 348A0E26 00000000 00084000 00000000 00000000
[  105.028711] mei_me 0000:00:16.0: timer: init clients timeout hbm_state = 2.
[  105.028789] mei_me 0000:00:16.0: unexpected reset: dev_state = INIT_CLIENTS fw status = 001F0252 348A0E26 00000000 00084000 00000000 00000000
[  105.028797] mei_me 0000:00:16.0: reset: reached maximal consecutive resets: disabling the device
[ 3939.672762]  zd0: p1 p2
[ 3940.133015]  zd16: p1 p2
[ 3940.700128]  zd32: p1 p2 p3
[ 3941.185524]  zd48: p1 p2 p3
[ 3941.650778]  zd64: p1
[ 3942.101795]  zd80: p1
[14547.446785] device tap118i0 entered promiscuous mode
[14547.476417] fwbr118i0: port 1(fwln118i0) entered blocking state
[14547.476418] fwbr118i0: port 1(fwln118i0) entered disabled state
AxeyGabriel commented 3 years ago

I have this same problem! Cannot delete a corrupted dataset. It show "Cannot iterate filesystems: I/O Error" But the weird part is that zfs mount -a aborts and i have partially mounted pools

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.