openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.61k stars 1.75k forks source link

ZFS-8000-8A description missing detail #9396

Open hadmut opened 5 years ago

hadmut commented 5 years ago

Hi, this is rather about a missing detail in the docs than a software issue.

I have a corruption on an ZFS ssd device, and zpool status -xv says

status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://zfsonlinux.org/msg/ZFS-8000-8A ...

errors: Permanent errors have been detected in the following files:

    SOMECONFIDENTIALPATH:<0x1c855>

but I cannot find any corrupted, unreadable file or a file with this filename. My guess was that this would mean the inode number (although the path doesn't make sense anymore if a file can be identified by inode only), but a find MOUNTPOINT -inum 116821 did not find a file either.

Even a diff with a backup device does not reveal any problem. The webpage http://zfsonlinux.org/msg/ZFS-8000-8A does not mention what this <0x1c855> syntax means or how to find and remove that file.

Would be nice if the webpage or the output of zpool status -xv would be a little bit more talkative about how to deal with that. E.g. if it requires the whole device to be erased and recovered from backup, or whether a zfs send and zfs receive could successfully dump and restore into a clean state.

Distribution Name | Ubuntu Distribution Version | 18.04 Linux Kernel | 4.15.0-64-generic Architecture | x86_64 ZFS Version | 0.7.5-1ubuntu16.6 SPL Version | 0.7.5-1ubuntu2

stale[bot] commented 4 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

devZer0 commented 3 years ago

i'm not ok with stalebot closing unanswered/unfixed issues

behlendorf commented 3 years ago

For cases like this I'm always happy to reopen the issue and tag it appropriately.

devZer0 commented 3 years ago

but stalebot makes problem reports silently go away and rudely obsolets other peoples work.

if the author of this report would not take action to reopen - the problem would get lost unnoticed. and if people who report bugs see that they get closed without further action, they stop reporting bugs.

behlendorf commented 3 years ago

That's definitely not my intent. In fact, I think at the moment it's having the opposite effect. When an issue gets automatically tagged as stale or closed by the bot I'll normally take a fresh look at it. It provides a useful way to resurface issues so they can be periodically reassessed to see if they're still relevant and helps prevent them from getting lost in the other open issues. The issue is never lost and can be always be reopened if it's incorrectly closed.

devZer0 commented 3 years ago

When an issue gets automatically tagged as stale or closed by the bot I'll normally take a fresh look at it.

ok, if every stale closed tickets gets another review by some person to decide if it's important or not, then i'm fine with it. i didn't know, so thanks for making it clear.

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

Temtaime commented 2 years ago

Same here. zpool status -v

  pool: main-storage
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 05:01:44 with 0 errors on Sun Mar 13 05:25:47 2022
config:

        NAME                                        STATE     READ WRITE CKSUM
        main-storage                                ONLINE       0     0     0
          ata-WDC_WD40EZAZ-00SF3B0_WD-WX92D4188HCF  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:
        main-storage:<0x80>

What does this offset represent? Also the error appeared after intensive write load on SMR drive. The drive is OK itself.

zfs --version

zfs-2.1.2-pve1
zfs-kmod-2.1.2-pve1

Am i using outdated zfs or it is a zfs problem? I can attach smartctl --xall if needed I don't want recreate the pool, how i can fix this error?