Open slrslr opened 2 years ago
Your disks appear to be dying.
There's not really much ZFS can do if a non-redundant disk backing a pool goes away other than hope it comes back and doesn't immediately go away if it asks again.
kernel: blk_update_request: I/O error, dev sde, sector 19537934408 op 0x1:(WRITE) flags 0x4700 phys_seg 24 prio class 0
This is a hardware I/O error. Hardware errors exist at a level well below ZFS or even the kernel itself. If you need assistance figuring out how to deal with a direct hardware error, you're looking for either a general Linux support community, or a trustworthy local PC repair shop, depending on how much hand holding you need in dealing with it.
In short, you're experiencing hardware failure in one of the following, ranked in order of decreasing likelihood:
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
Distribution Name: Linux Debian 11 up to date zfs version: zfs-2.1.5-1~bpo11+1; zfs-kmod-2.1.5-1~bpo11+1 uname -r: 5.10.0-18-amd64
Hello, sdc, sdd drives = 1 ZFS pool, sde is another ZFS pool. I am syncing to sde pool ("poolname") using syncoid tool (sudo zfs send 'srcpool/data'@'syncoid_pc_2021-06-07' | mbuffer -q -s 128k -m 16M 2>/dev/null | pv -s 6451598964144 | sudo zfs receive -s -F 'poolname/data).
During last hours when "syncoid -r sourcepool destpool" was run, I have seen a few journal lines of this kind:
kernel: blk_update_request: I/O error, dev sdd, sector 22920562456 op 0x1:(WRITE) flags 0x4700 phys_seg 30 prio class 0
Then like 6 such lines in one minute and then drive started producing repeated deep clicking noise each several seconds, like the heads full speed stop.
ksysguardd[4181945]: Disk device disappeared
(this may be about different drive)repeated messages:
(similar messages also below..)
tons of these messages:
...
...
tried to mount the pool, got "pool I/O is currently suspended" After turning off and then on the drive, SMART short self test found no error, PASSED.
sde poolname was SUSPENDED, "One or more devices are faulted in response to IO failures." STATE: DEGRADED, too many errors after running "sudo zpool clear poolname" -> poolname is online ("One or more devices has experienced an error resulting in data corruption. Applications may be affected.")
After trying to run syncoid again, it seems that it skipped to next dataset (previous was not completed most likely according to progress that i was monitoring).
After some minutes i heard unusualy sound from drive and see in journal:
drive status now:
Thank you ♥️