Open jgottula opened 7 years ago
I don't have a solution, yet, but also see the same bug. This is on a 36 bay system populated with 8T disks.
`# zpool status pool: rzpool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Dec 8 08:45:18 2017 1.78G scanned out of 39.7T at 2.29M/s, (scan is slow, no estimated time) 116M resilvered, 0.00% done config:
NAME STATE READ WRITE CKSUM
rzpool DEGRADED 0 0 450
raidz3-0 ONLINE 0 0 1.20K
scsi-35000c50092bcebfc ONLINE 0 0 0
scsi-35000c50092bd7315 ONLINE 0 0 0
scsi-35000c50092bd382e ONLINE 0 0 0
scsi-35000c50092bcb8d2 ONLINE 0 0 0
scsi-35000c50092bca1e7 ONLINE 0 0 1
scsi-35000c50092bd28d7 ONLINE 0 0 0
scsi-35000c50092bd45b1 ONLINE 0 0 0
scsi-35000c50092bdc4db ONLINE 0 0 0
scsi-35000c50092bd4ba1 ONLINE 0 0 1
scsi-35000c50092bd42ea ONLINE 0 0 0
scsi-35000c50092bd7396 ONLINE 0 0 0
scsi-35000c50092bcff8a ONLINE 0 0 0 (resilvering)
raidz3-1 ONLINE 0 0 73
scsi-35000c50092bdefbb ONLINE 0 0 1
scsi-35000c50092bc6763 ONLINE 0 0 0 (resilvering)
scsi-35000c50092bd86a9 ONLINE 0 0 1
scsi-35000c50092bd83fe ONLINE 0 0 0
scsi-35000c50092bd0fd2 ONLINE 0 0 1
scsi-35000c50092bc37f3 ONLINE 0 0 0
scsi-35000c50092bd9ec5 ONLINE 0 0 0
scsi-35000c50092bd12bc ONLINE 0 0 0 (resilvering)
scsi-35000c50092be8904 ONLINE 0 0 2
scsi-35000c50092bdd9ba ONLINE 0 0 0
scsi-35000c50092bccac5 ONLINE 0 0 0
scsi-35000c50092bed185 ONLINE 0 0 0
raidz3-2 DEGRADED 0 0 504
scsi-35000c50092bdf8ba ONLINE 0 0 0
scsi-35000c50092bcd23b ONLINE 0 0 0
scsi-35000c50092bc583c ONLINE 0 0 0
replacing-3 DEGRADED 0 0 0
1335565945269893853 UNAVAIL 0 0 0 was /dev/sdae1/old
scsi-35000c50092bebb1a ONLINE 0 0 0 (resilvering)
scsi-35000c50092bc6c6c ONLINE 0 0 0 (resilvering)
scsi-35000c50092bcfbbe ONLINE 0 0 0
scsi-35000c50092bcd906 ONLINE 0 0 0
scsi-35000c50092bd88ba ONLINE 0 0 0 (resilvering)
scsi-35000c50092bddd7d ONLINE 0 0 0 (resilvering)
scsi-35000c50092be1910 ONLINE 0 0 0
scsi-35000c50092bdfd56 ONLINE 0 0 0
scsi-35000c50092bd53a7 ONLINE 0 0 0
errors: 2927 data errors, use '-v' for a list `
In the dmesg:
[71413.662463] VERIFY3(0 == remove_reference(hdr, ((void *)0), tag)) failed (0 == 1) [71413.662470] PANIC at arc.c:3076:arc_buf_destroy() [71413.662472] Showing stack for process 19023 [71413.662476] CPU: 3 PID: 19023 Comm: z_rd_int_7 Tainted: P O 4.14.3 #1 [71413.662477] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015 [71413.662478] Call Trace: [71413.662491] dump_stack+0x63/0x89 [71413.662505] spl_dumpstack+0x44/0x50 [spl] [71413.662512] spl_panic+0xc9/0x110 [spl] [71413.662591] ? vdev_disk_io_start+0x4d9/0x6c0 [zfs] [71413.662596] ? __slab_free+0x9b/0x2c0 [71413.662599] ? __slab_free+0x9b/0x2c0 [71413.662637] ? arc_available_memory+0x2e/0xd0 [zfs] [71413.662677] arc_buf_destroy+0x11c/0x130 [zfs] [71413.662717] dbuf_read_done+0x91/0xf0 [zfs] [71413.662753] arc_read_done+0x17e/0x2d0 [zfs] [71413.662815] zio_done+0x311/0xcb0 [zfs] [71413.662819] ? kfree+0x133/0x180 [71413.662825] ? spl_kmem_free+0x35/0x40 [spl] [71413.662884] zio_execute+0x8f/0xf0 [zfs] [71413.662892] taskq_thread+0x2af/0x530 [spl] [71413.662897] ? wake_up_q+0x80/0x80 [71413.662900] kthread+0x109/0x140 [71413.662906] ? taskq_thread_spawn+0x50/0x50 [spl] [71413.662908] ? kthread_park+0x60/0x60 [71413.662912] ret_from_fork+0x25/0x30
Then followed by many more of the same messages.
I started out with an Areca SAS Raid Card, and initially thought the problem was with the card, so I replaced it with an LSI, but still see the same issue.
Initially using the latest CentOS 7 kernel with the CentOS 7 zfsonlinux, but installed kernel 4.14.3 yesterday with spl-0.7.3 (patched to change "vfs_read and vfs_write to kernel_read and kernel_write" and zfs-0.7.3. Still see the same error.
If anyone on the zfsonlinux project has an idea what is going on here, I would very much appreciate it!
Any movement on this issue?
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
This issue was raised at the September 2021 OpenZFS Leadership meeting as a potentially serious bug that has fallen through the cracks. It may relate to https://www.illumos.org/issues/14003#note-1
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
This bug is still out there, but it's apparently hard to reproduce. We've seen it "in the wild". See also: https://www.illumos.org/issues/15790 which also looks like a close match. There is now analysis and a reproducer on https://www.illumos.org/issues/14003 Fix under also posted under that issue.
I can confirm this issue is still relevant - I just got it. I might suggest a cause though. First my details 5.15.0-25-generic #25-Ubuntu SMP Wed Mar 30 15:54:22 UTC 2022 ii libzfs4linux 2.1.2-1ubuntu3 amd64 OpenZFS filesystem library for Linux - general support ii libzpool5linux 2.1.2-1ubuntu3 amd64 OpenZFS pool library for Linux ii zfs-dkms 2.1.2-1ubuntu3 all OpenZFS filesystem kernel modules for Linux ii zfs-zed 2.1.2-1ubuntu3 amd64 OpenZFS Event Daemon ii zfsutils-linux 2.1.2-1ubuntu3 amd64 command-line tools to manage OpenZFS filesystems
And the error:
[1351503.757506] INFO: task z_rd_int_1:7824 blocked for more than 362 seconds.
[1351503.757510] Tainted: P O 5.15.0-25-generic #25-Ubuntu
[1351503.757512] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1351503.757529] task:z_rd_int_1 state:D stack: 0 pid: 7824 ppid: 2 flags:0x00004000
[1351503.757531] Call Trace:
[1351503.757533]
and the console:
kernel:[1351131.586643] VERIFY3(0 == remove_reference(hdr, NULL, tag)) failed (0 == 1) kernel:[1351131.586646] PANIC at arc.c:3842:arc_buf_destroy()
This was a device that is hanging off the usbc that developed a fault on an mdraid I am running (in the middle of upgrading this laptop). I had to reseat the cable (pull it out, plug it in), so it is highly likely that something about device errors in the interim caused this issue.
The mdraid BTW recovered with no issues (AFAIK), but it would appear to be a definite trigger event.
I also just ran into something very similar. This is with regular spinning rust drives connected via SATA (using a SAS expander) and there were no other IO-related errors in the kernel log. Since it was mentioned in some other issue, I use ZFS encryption.
From SMART monitoring it seems that ata-ST10000VN0008-2PJ103_ZS515AYN
is having an increase in raw read errors, but the panic along with the fact that attempts to run zpool scrub
result in a hung zpool
process (and running zpool status
also hangs) seems to indicate there might be more going on. Also the device itself passes a SMART healthcheck (though I'm sure those aren't the most reliable things in the world...).
[2530221.641600] VERIFY3(remove_reference(hdr, hdr) > 0) failed (0 > 0)
[2530221.641627] PANIC at arc.c:6610:arc_write_done()
[2530221.641645] Showing stack for process 21525
[2530221.641658] CPU: 8 PID: 21525 Comm: z_wr_int_0 Tainted: P O N 5.14.21-150500.55.49-default #1 SLE15-SP5 a8d284bf25556592b1cf60fe7a08b3519f64459a
[2530221.641690] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4, BIOS P1.50 11/05/2018
[2530221.641710] Call Trace:
[2530221.641722] <TASK>
[2530221.641735] dump_stack_lvl+0x45/0x5b
[2530221.641758] spl_panic+0xc8/0x100 [spl af95f7395f3eb9a449f705221058fe2fb998d52b]
[2530221.641798] ? kmem_cache_free+0x3c0/0x410
[2530221.641816] ? srso_return_thunk+0x5/0x10
[2530221.641832] ? kmem_cache_free+0x3c0/0x410
[2530221.641851] ? spl_kmem_cache_free+0x110/0x1e0 [spl af95f7395f3eb9a449f705221058fe2fb998d52b]
[2530221.641883] ? srso_return_thunk+0x5/0x10
[2530221.641898] ? put_cpu_partial+0xb2/0xc0
[2530221.641914] ? srso_return_thunk+0x5/0x10
[2530221.641928] ? kmem_cache_free+0x3c0/0x410
[2530221.641946] ? srso_return_thunk+0x5/0x10
[2530221.641961] ? spl_kmem_cache_free+0x110/0x1e0 [spl af95f7395f3eb9a449f705221058fe2fb998d52b]
[2530221.641996] arc_write_done+0x42b/0x4d0 [zfs 251ddbcbed81ad8aaff15de0483e32050921dab8]
[2530221.642223] zio_done+0x3f1/0xfb0 [zfs 251ddbcbed81ad8aaff15de0483e32050921dab8]
[2530221.642485] zio_execute+0x86/0xf0 [zfs 251ddbcbed81ad8aaff15de0483e32050921dab8]
[2530221.642736] taskq_thread+0x266/0x450 [spl af95f7395f3eb9a449f705221058fe2fb998d52b]
[2530221.642771] ? wake_up_q+0x90/0x90
[2530221.642793] ? zio_reexecute+0x3e0/0x3e0 [zfs 251ddbcbed81ad8aaff15de0483e32050921dab8]
[2530221.643125] ? task_done+0xa0/0xa0 [spl af95f7395f3eb9a449f705221058fe2fb998d52b]
[2530221.643168] kthread+0x156/0x180
[2530221.643187] ? set_kthread_struct+0x50/0x50
[2530221.643208] ret_from_fork+0x22/0x30
[2530221.643238] </TASK>
% uname -a
Linux navi 5.14.21-150500.55.49-default #1 SMP PREEMPT_DYNAMIC Sun Feb 11 17:48:15 UTC 2024 (36baf2f) x86_64 x86_64 x86_64 GNU/Linux
% sudo zfs version
zfs-2.2.3-1
zfs-kmod-2.2.3-1
% sudo zpool status
pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 1 days 21:21:26 with 0 errors on Tue Mar 26 19:21:41 2024
remove: Removal of vdev 1 copied 3.59T in 9h27m, completed on Sat Feb 12 23:50:24 2022
7.70M memory used for removed device mappings
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-TOSHIBA_HDWG11A_70P0A0Q8FATG ONLINE 0 0 0
ata-ST10000VN0008-2PJ103_ZS515AYN ONLINE 0 0 1
mirror-2 ONLINE 0 0 0
ata-TOSHIBA_HDWG11A_70P0A0Y4FATG ONLINE 0 0 0
ata-ST10000VN0008-2PJ103_ZS515P7F ONLINE 0 0 0
errors: 2 data errors, use '-v' for a list
EDIT: Weirdly, after a reboot the warning was still there in status, but there was no CKSUM error listed for the problem drive, and after running zpool status -v
no "problem files" were listed and the warning in status disappeared... I'm running a scrub now...
Skip down a couple sections if you don't care about any of this backstory; it might be relevant to the problem, or it might not, I don't know.
Background information
I have a ZFS filesystem containing disk image files which I update every week and then take a snapshot, so that I have a weekly backup history stored up.
I had some disk corruption on this pool's 4-disk raidz1 vdev during a period in which I was running it on only 3 disks, so
zpool status -v
knows of a couple of snapshots that contain corrupted data. I'm fairly certain that there are also a few more snapshots affected than just whatzpool status -v
reports, due to shared blocks between the snapshots.Basically, this pool is barely hanging on, with at least one very marginal drive. I've spent over 3 months and numerous hours now attempting to get all of my data off of this doomed pool via
zfs send
/zfs recv
, something that is still not fully possible due toEINVAL
-related problems withzfs recv
that I've described in other issue reports here.I also cannot really "simply swap out the bad disks", because whenever I attempt a
zfs replace
, it gets stuck at a certain point, never completes, and never detaches the old device. (Oh and this pool now also spends its time perpetually resilvering over and over, apparently due to the stuck-replacement problem. Or some other bug. I don't know.)Here's what this wonderful pool looks like:
What I was doing when the panics happened
I've resigned myself to the inevitability that I'll never be able to
zfs send
/zfs recv
my data off of this pool and onto a new one before my drives have fully disintegrated into scrap. Since this collection of historical backups relies extremely heavily on CoW space savings between snapshots (it's about 2.77 TiB compressed, but would be well over 7x larger otherwise), I decided to write up some scripts and manually port over the data to the new pool in a manner that maintains the CoW space savings by only writing the changed parts.The details of that aren't relevant; what is relevant here is that as a precondition to doing this manual transfer process, I ran a bunch of
md5sum
processes with GNUparallel
to get hashes of each disk image at each snapshot. Most of this went fine.For reference,
zpool status -v
knows of data corruption in snapshots20170327
and20170529
.The
md5sum
process that attempted to read the disk image in the20170327
snapshot error'd out withEIO
, as expected.However, the
md5sum
processes that attempted to read the disk image in the20170515
,20170522
,20170529
, and20170605
snapshots got stuck in stateD
(uninterruptible sleep) forever, assumedly due to the task thread panics that are the subject of this report.Interestingly, the system is still entirely usable (i.e. other I/O initiated after the problem doesn't get stuck); but those
md5sum
processes are forever stuck in stateD
, and the couple ofz_rd_int_7
andz_rd_int_0
kernel threads that had the panics are now idle, doing nothing.The task thread panics themselves
Two panics, separated by approximately 2 seconds.
The stuck
md5sum
processesAll of the stuck
md5sum
processes have the following stack:System information