Open trossoma opened 11 months ago
What would be helpful would be testing if using a newer OpenZFS version on older CentOS had the issue, to try and see if it's a change in OpenZFS or Linux.
That said, I would personally never recommend using SATA PMPs for precisely the reasons you're observing about how the failure domains aren't really isolated.
If these are issues on the ATA/AHCI side of things, is it fair to speak of a ZFS issue? It's more like ZFS changes exposed problems in either the AHCI driver or, more likely, dodgy niche SATA hardware. Many random AHCI controllers (even from notable and reputable manufacturers, such as Marvell) are known to be flaky, and SATA port multipliers are so terribly dodgy that Intel does not support them and disk manufacturers jump through all sorts of crazy hoops to not use them when they want to cram two disks in one chassis.
If these are issues on the ATA/AHCI side of things, is it fair to speak of a ZFS issue? It's more like ZFS changes exposed problems in either the AHCI driver or, more likely, dodgy niche SATA hardware. Many random AHCI controllers (even from notable and reputable manufacturers, such as Marvell) are known to be flaky, and SATA port multipliers are so terribly dodgy that Intel does not support them and disk manufacturers jump through all sorts of crazy hoops to not use them when they want to cram two disks in one chassis.
It doesn't seem like ZFS is the culprit, but since we've been unable to rule it out I figured I'd post issue in hopes of gaining some more insight.
What would be helpful would be testing if using a newer OpenZFS version on older CentOS had the issue, to try and see if it's a change in OpenZFS or Linux.
That said, I would personally never recommend using SATA PMPs for precisely the reasons you're observing about how the failure domains aren't really isolated.
@rincebrain Took your suggestion and upgraded ZFS on CentOS from 0.8.6-1 -> 2.1.5-1. Some ZFS I/O errors occurred due to ATA command timeouts as we've seen in AlmaLinux. These types of ATA command timeouts have not been seen in long standing systems running CentOS w/ ZFS 0.8.6-1. Any idea what types of changes in ZFS could increase likelihood of ATA command timeouts?
Guessing wildly that you're having more pending IOs at once and the PMPs are handling it as well as PMPs do, or the disks are, or both.
Could be a cascading failure on top of the problem that #15588 is hoping to resolve, though that was technically a problem in 0.8.x I believe, so it would have required other changes incidentally causing it more often for you.
Could be something like a cascading failure on #10094 or similar.
Guessing wildly that you're having more pending IOs at once and the PMPs are handling it as well as PMPs do, or the disks are, or both.
Could be a cascading failure on top of the problem that #15588 is hoping to resolve, though that was technically a problem in 0.8.x I believe, so it would have required other changes incidentally causing it more often for you.
Could be something like a cascading failure on #10094 or similar.
@rincebrain We've done some more testing and have the following findings:
fio
to generate I/O workload. Details: random read/write (r30%/w70%) workload, with chunk size 16K. I/O occurs on ZFS filesystem w/ caching disabled During these tests the only change is the ZFS version. SATA comm errors will occur in a few hours (usually 1-2) with this I/O workload in ZFS versions >= 2.0.0. Any idea what changes in ZFS 2.0.0 would cause the different pattern of I/O? Some tunables are involved?
System information
Describe the problem you're observing
Intermittent communication dropouts are occurring with SSD drives that are attached to Marvell 88SE9235 SATA controllers via Marvell 88SM9705 port multipliers. When the issue occurs, communication with all SSD drives (5) connected to port multiplier is lost and the driver performs recovery steps in order to re-establish connection with the SSD drives. This results in ZFS I/O errors being reported from
zpool status
. Multiple events with unsuccessful recovery steps by driver can lead to pool suspension. The issue occurs with both small and large I/O workloads, though usually takes longer to manifest with small I/O workload.The issue does not occur with older version of CentOS and ZFS running on same hardware. Details: Distribution Name | CentOS Distribution Version | 7.9 Kernel Version | 3.10.0-1160.15.2 Architecture | x64 OpenZFS Version | 0.8.6-1
Typically, the ZFS pools in use on AlmaLinux were created in CentOS. Creating the pools in AlmaLinux did not resolve issue. Have tried the following, in different combinations:
Describe how to reproduce the problem
Small I/O workload: Boot-up system w/ apps that generate small sustained I/O load on the ZFS pool and let it run w/o interaction Large I/O workload: Use
fio
to generate heavy I/O workload on ZFS poolInclude any warning/errors/backtraces from the system logs
Syslog Snippet
Zpool Status