openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.57k stars 1.75k forks source link

zfs 2.2.4 and 2000% write degradation performance #16418

Open batot1 opened 2 months ago

batot1 commented 2 months ago
root@pve2:~# lspci -vv |grep -A18 -i sas
03:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
        Subsystem: Dell 6Gbps SAS HBA Adapter
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 33
        IOMMU group: 13
        Region 0: I/O ports at e000 [size=256]
        Region 1: Memory at fc340000 (64-bit, non-prefetchable) [size=64K]
pcilib: sysfs_read_vpd: read failed: No such device
        Region 3: Memory at fc300000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at fc200000 [disabled] [size=1M]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
--
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas

root@pve2:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           125Gi       4.2Gi       122Gi        47Mi       364Mi       121Gi
Swap:          8.0Gi          0B       8.0Gi

root@pve2:~# uname -a
Linux pve2 6.8.8-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.8-4 (2024-07-26T11:15Z) x86_64 GNU/Linux

root@pve2:~# modinfo mpt3sas
filename:       /lib/modules/6.8.8-4-pve/kernel/drivers/scsi/mpt3sas/mpt3sas.ko
alias:          mpt2sas
version:        43.100.00.00
license:        GPL
description:    LSI MPT Fusion SAS 3.0 Device Driver
author:         Avago Technologies <[MPT-FusionLinux.pdl@avagotech.com](mailto:MPT-FusionLinux.pdl@avagotech.com)>
srcversion:     D2BBF2326C8C5E81BD6F430
alias:          pci:v00001000d000000E7sv*sd*bc*sc*i*
alias:          pci:v00001000d000000E4sv*sd*bc*sc*i*
alias:          pci:v0000117Cd000000E6sv*sd*bc*sc*i*
alias:          pci:v00001000d000000E6sv*sd*bc*sc*i*
alias:          pci:v00001000d000000E5sv*sd*bc*sc*i*
alias:          pci:v00001000d000000B2sv*sd*bc*sc*i*
alias:          pci:v00001000d000000E3sv*sd*bc*sc*i*
alias:          pci:v00001000d000000E0sv*sd*bc*sc*i*
alias:          pci:v00001000d000000E2sv*sd*bc*sc*i*
alias:          pci:v00001000d000000E1sv*sd*bc*sc*i*
alias:          pci:v00001000d000000D1sv*sd*bc*sc*i*
alias:          pci:v00001000d000000ACsv*sd*bc*sc*i*
alias:          pci:v00001000d000000ABsv*sd*bc*sc*i*
alias:          pci:v00001000d000000AAsv*sd*bc*sc*i*
alias:          pci:v00001000d000000AFsv*sd*bc*sc*i*
alias:          pci:v00001000d000000AEsv*sd*bc*sc*i*
alias:          pci:v00001000d000000ADsv*sd*bc*sc*i*
alias:          pci:v00001000d000000C3sv*sd*bc*sc*i*
alias:          pci:v00001000d000000C2sv*sd*bc*sc*i*
alias:          pci:v00001000d000000C1sv*sd*bc*sc*i*
alias:          pci:v00001000d000000C0sv*sd*bc*sc*i*
alias:          pci:v00001000d000000C8sv*sd*bc*sc*i*
alias:          pci:v00001000d000000C7sv*sd*bc*sc*i*
alias:          pci:v00001000d000000C6sv*sd*bc*sc*i*
alias:          pci:v00001000d000000C5sv*sd*bc*sc*i*
alias:          pci:v00001000d000000C4sv*sd*bc*sc*i*
alias:          pci:v00001000d000000C9sv*sd*bc*sc*i*
alias:          pci:v00001000d00000095sv*sd*bc*sc*i*
alias:          pci:v00001000d00000094sv*sd*bc*sc*i*
alias:          pci:v00001000d00000091sv*sd*bc*sc*i*
alias:          pci:v00001000d00000090sv*sd*bc*sc*i*
alias:          pci:v00001000d00000097sv*sd*bc*sc*i*
alias:          pci:v00001000d00000096sv*sd*bc*sc*i*
alias:          pci:v00001000d0000007Esv*sd*bc*sc*i*
alias:          pci:v00001000d000002B1sv*sd*bc*sc*i*
alias:          pci:v00001000d000002B0sv*sd*bc*sc*i*
alias:          pci:v00001000d0000006Esv*sd*bc*sc*i*
alias:          pci:v00001000d00000087sv*sd*bc*sc*i*
alias:          pci:v00001000d00000086sv*sd*bc*sc*i*
alias:          pci:v00001000d00000085sv*sd*bc*sc*i*
alias:          pci:v00001000d00000084sv*sd*bc*sc*i*
alias:          pci:v00001000d00000083sv*sd*bc*sc*i*
alias:          pci:v00001000d00000082sv*sd*bc*sc*i*
alias:          pci:v00001000d00000081sv*sd*bc*sc*i*
alias:          pci:v00001000d00000080sv*sd*bc*sc*i*
alias:          pci:v00001000d00000065sv*sd*bc*sc*i*
alias:          pci:v00001000d00000064sv*sd*bc*sc*i*
alias:          pci:v00001000d00000077sv*sd*bc*sc*i*
alias:          pci:v00001000d00000076sv*sd*bc*sc*i*
alias:          pci:v00001000d00000074sv*sd*bc*sc*i*
alias:          pci:v00001000d00000072sv*sd*bc*sc*i*
alias:          pci:v00001000d00000070sv*sd*bc*sc*i*
depends:        scsi_transport_sas,raid_class
retpoline:      Y
intree:         Y
name:           mpt3sas
vermagic:       6.8.8-4-pve SMP preempt mod_unload modversions
sig_id:         PKCS#7
signer:         Build time autogenerated kernel key
sig_key:        66:5F:A5:D1:98:66:55:4B:39:C4:B7:90:CD:06:8C:A3:16:64:BF:3E
sig_hashalgo:   sha512
signature:      3F:0D:4B:44:08:78:68:85:FC:80:2B:24:3F:BE:1C:4B:D2:41:AA:E8:
                BA:E2:19:82:7F:BE:4A:2B:1C:D7:ED:00:64:95:FE:C6:C6:4F:D8:B8:
                DC:59:C2:D7:C9:D4:5A:8F:4D:9B:C5:90:00:22:48:70:06:A4:7D:40:
                93:01:EF:EC:C4:2F:BD:A6:15:6A:1C:BE:14:38:73:70:A8:BA:D2:F8:
                7D:E1:8F:EC:1F:B0:26:D0:52:37:69:E1:88:06:14:03:55:92:BE:8C:
                16:31:21:3F:89:24:2E:1D:3D:B8:F4:CD:DA:B8:31:53:71:75:CE:D9:
                9D:66:0F:A2:4D:63:D8:D2:C7:11:64:97:87:33:BE:20:00:05:74:9C:
                D3:74:C5:C2:85:C8:0F:6C:10:7A:ED:BF:EF:7E:75:1E:38:14:29:9D:
                E6:5D:4B:ED:58:E7:96:AE:8C:53:88:4D:B7:78:AB:E7:A8:85:B5:F6:
                E7:8A:D8:97:78:95:94:7A:95:A6:6B:7B:EB:58:D7:09:75:FD:22:25:
                75:B0:21:BB:87:D8:9C:CA:F6:A1:32:FF:31:43:25:39:6F:2A:E4:5C:
                31:D0:F5:FC:2B:12:71:E7:B2:A6:93:9C:AA:ED:BC:9A:82:46:E1:3C:
                07:54:B4:B8:12:E7:42:C9:AE:42:EF:65:71:C9:2E:ED:F4:04:67:C4:
                3E:4F:55:98:73:7E:64:C0:AE:28:7D:76:F1:85:53:77:97:23:AE:17:
                8F:DE:80:27:A1:2A:7C:48:6F:A7:91:11:7C:C6:8F:85:91:B8:47:12:
                67:4B:89:AB:EF:FC:35:C5:A3:5C:AD:C2:C5:B2:F9:C3:2B:61:99:55:
                39:7A:33:F1:7B:6C:BE:CB:67:E7:2C:37:64:72:38:B6:9A:48:FC:4B:
                B1:0C:52:0F:C9:9B:CA:B4:14:F4:A1:9E:7F:7B:19:B4:89:84:2A:2B:
                80:21:FF:AB:9B:49:3A:D3:E0:77:8E:91:EB:1A:A4:F2:C2:2E:9F:C9:
                5B:12:28:AC:54:51:61:38:B9:63:1D:FD:65:75:69:BD:56:C6:A7:E0:
                56:49:62:3D:31:6F:23:C1:6D:C0:A4:30:CF:E5:9F:7F:9E:E8:CC:AF:
                65:12:5F:74:45:47:40:DE:9F:BD:66:E5:B6:7E:37:CA:AE:DD:8A:E9:
                AF:70:16:96:66:AD:95:A8:91:87:60:D5:07:01:5E:5B:2B:AF:40:82:
                D7:BB:B0:7C:55:BD:0B:64:8B:D8:92:F5:F7:3C:2D:92:4D:2D:D2:D5:
                3C:EA:82:6D:68:A9:C6:C0:32:6D:BB:F3:1E:F7:6E:86:59:F5:83:33:
                26:C9:38:27:21:D6:84:D5:04:84:1C:2C
parm:           logging_level: bits for enabling additional logging info (default=0)
parm:           max_sectors:max sectors, range 64 to 32767  default=32767 (ushort)
parm:           missing_delay: device missing delay , io missing delay (array of int)
parm:           max_lun: max lun, default=16895  (ullong)
parm:           hbas_to_enumerate: 0 - enumerates both SAS 2.0 & SAS 3.0 generation HBAs
                  1 - enumerates only SAS 2.0 generation HBAs
                  2 - enumerates only SAS 3.0 generation HBAs (default=0) (ushort)
parm:           diag_buffer_enable: post diag buffers (TRACE=1/SNAPSHOT=2/EXTENDED=4/default=0) (int)
parm:           disable_discovery: disable discovery  (int)
parm:           prot_mask: host protection capabilities mask, def=7  (int)
parm:           enable_sdev_max_qd:Enable sdev max qd as can_queue, def=disabled(0) (bool)
parm:           multipath_on_hba:Multipath support to add same target device
                as many times as it is visible to HBA from various paths
                (by default:
                         SAS 2.0 & SAS 3.0 HBA - This will be disabled,
                         SAS 3.5 HBA - This will be enabled) (int)
parm:           host_tagset_enable:Shared host tagset enable/disable Default: enable(1) (int)
parm:           max_queue_depth: max controller queue depth  (int)
parm:           max_sgl_entries: max sg entries  (int)
parm:           msix_disable: disable msix routed interrupts (default=0) (int)
parm:           smp_affinity_enable:SMP affinity feature enable/disable Default: enable(1) (int)
parm:           max_msix_vectors: max msix vectors (int)
parm:           irqpoll_weight:irq poll weight (default= one fourth of HBA queue depth) (int)
parm:           mpt3sas_fwfault_debug: enable detection of firmware fault and halt firmware - (default=0)
parm:           perf_mode:Performance mode (only for Aero/Sea Generation), options:
                0 - balanced: high iops mode is enabled &
                interrupt coalescing is enabled only on high iops queues,
                1 - iops: high iops mode is disabled &
                interrupt coalescing is enabled on all queues,
                2 - latency: high iops mode is disabled &
                interrupt coalescing is enabled on all queues with timeout value 0xA,
                default - default perf_mode is 'balanced' (int)
parm:           poll_queues:Number of queues to be use for io_uring poll mode.
                This parameter is effective only if host_tagset_enable=1. &
                when poll_queues are enabled then &
                perf_mode is set to latency mode. &
                 (int)
root@pve2:~# lscpu |grep Model\ name
Model name:                           AMD Ryzen 7 5700X 8-Core Processor

root@pve2:~# zfs --version
zfs-2.2.4-pve1
zfs-kmod-2.2.4-pve1

root@pve2:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

PRC | sys    0.24s | user   0.07s |              |              | #proc    532 | #trun      1 |              | #tslpi   390 | #tslpu   165 |              | #zombie    0 | clones     0 |              |              | #exit      0 |
CPU | sys       5% | user      1% | irq       0% |              |              | idle    534% | wait   1061% | steal     0% |              | guest     0% |              | ipc     0.38 | cycl   22MHz |              | curf 3.44GHz |
cpu | sys       1% | user      0% | irq       0% |              |              | idle     30% | cpu015 w 69% | steal     0% |              | guest     0% |              | ipc     0.32 | cycl   76MHz |              | curf 3.85GHz |
cpu | sys       1% | user      1% | irq       0% |              |              | idle     56% | cpu003 w 43% | steal     0% |              | guest     0% |              | ipc     0.35 | cycl   52MHz |              | curf 3.40GHz |
cpu | sys       1% | user      0% | irq       0% |              |              | idle     54% | cpu006 w 45% | steal     0% |              | guest     0% |              | ipc     0.39 | cycl   54MHz |              | curf 3.59GHz |
cpu | sys       1% | user      0% | irq       0% |              |              | idle     67% | cpu008 w 32% | steal     0% |              | guest     0% |              | ipc     0.79 | cycl   40MHz |              | curf 3.40GHz |
cpu | sys       0% | user      0% | irq       0% |              |              | idle      0% | cpu004 w100% | steal     0% |              | guest     0% |              | ipc     0.36 | cycl   23MHz |              | curf 3.40GHz |
cpu | sys       0% | user      0% | irq       0% |              |              | idle     30% | cpu005 w 69% | steal     0% |              | guest     0% |              | ipc     0.24 | cycl   14MHz |              | curf 3.40GHz |
cpu | sys       0% | user      0% | irq       0% |              |              | idle      6% | cpu007 w 94% | steal     0% |              | guest     0% |              | ipc     0.31 | cycl   11MHz |              | curf 3.40GHz |
cpu | sys       0% | user      0% | irq       0% |              |              | idle     90% | cpu012 w  9% | steal     0% |              | guest     0% |              | ipc     0.20 | cycl   16MHz |              | curf 3.40GHz |
cpu | sys       0% | user      0% | irq       0% |              |              | idle      6% | cpu001 w 93% | steal     0% |              | guest     0% |              | ipc     0.27 | cycl   10MHz |              | curf 3.40GHz |
cpu | sys       0% | user      0% | irq       0% |              |              | idle     49% | cpu002 w 51% | steal     0% |              | guest     0% |              | ipc     0.24 | cycl    7MHz |              | curf 3.40GHz |
cpu | sys       0% | user      0% | irq       0% |              |              | idle      2% | cpu009 w 97% | steal     0% |              | guest     0% |              | ipc     0.24 | cycl    8MHz |              | curf 3.40GHz |
cpu | sys       0% | user      0% | irq       0% |              |              | idle     74% | cpu014 w 26% | steal     0% |              | guest     0% |              | ipc     0.47 | cycl   10MHz |              | curf 3.40GHz |
cpu | sys       0% | user      0% | irq       0% |              |              | idle     10% | cpu011 w 90% | steal     0% |              | guest     0% |              | ipc     0.24 | cycl    8MHz |              | curf 3.40GHz |
CPL | numcpu    16 |              |              |              | avg1   10.83 |              | avg5    4.78 | avg15   3.03 |              |              |              | csw    13323 |              | intr    8252 |              |
MEM | tot   125.7G | free  111.2G |              |              | cache 304.1M | dirty   0.1M | buff   32.9M |              |              | slab    1.1G | slrec 132.1M |              | pgtab   8.7M |              |              |
MEM | numnode    1 |              |              | shmem  48.1M | shrss   0.0M | shswp   0.0M |              |              | tcpsk   0.0M |              | udpsk   0.0M |              |              |              | zfarc  11.6G |
SWP | tot     8.0G |              | free    8.0G |              | swcac   0.0M |              |              |              |              |              |              | vmcom   2.1G |              | vmlim  70.8G |              |
PAG | scan       0 | steal      0 |              | stall      0 | compact    0 | numamig    0 |              | migrate    0 | pgin   27648 |              | pgout  35095 | swin       0 | swout      0 |              | oomkill    0 |
PSI | cpusome   0% | memsome   0% |              | memfull   0% | iosome   85% | iofull   85% |              | cs     0/0/0 | ms     0/0/0 |              | mf     0/0/0 | is  84/57/27 | if  83/57/27 |              |              |
LVM |     pve-root | busy      0% | read       0 | write     12 |              | discrd     0 | KiB/r      0 | KiB/w      4 | KiB/d      0 | MBr/s    0.0 | MBw/s    0.0 |              | inflt      0 | avq     1.25 | avio 0.33 ms |
DSK |          sdc | busy     98% | read       0 | write    411 |              | discrd     0 | KiB/r      0 | KiB/w     85 | KiB/d      0 | MBr/s    0.0 | MBw/s    6.8 |              | inflt      1 | avq     1.99 | avio 11.9 ms |
DSK |          sdb | busy     98% | read       0 | write    405 |              | discrd     0 | KiB/r      0 | KiB/w     85 | KiB/d      0 | MBr/s    0.0 | MBw/s    6.8 |              | inflt      1 | avq     1.98 | avio 12.0 ms |
DSK |          sda | busy     97% | read       0 | write    427 |              | discrd     0 | KiB/r      0 | KiB/w     82 | KiB/d      0 | MBr/s    0.0 | MBw/s    6.9 |              | inflt      1 | avq     2.02 | avio 11.4 ms |
DSK |          sdd | busy     95% | read       0 | write    423 |              | discrd     0 | KiB/r      0 | KiB/w     82 | KiB/d      0 | MBr/s    0.0 | MBw/s    6.9 |              | inflt      1 | avq     1.99 | avio 11.2 ms |
DSK |          sdi | busy      4% | read      24 | write      0 |              | discrd     0 | KiB/r   1024 | KiB/w      0 | KiB/d      0 | MBr/s    4.8 | MBw/s    0.0 |              | inflt      0 | avq     0.85 | avio 7.71 ms |
DSK |          sdg | busy      3% | read      12 | write      0 |              | discrd     0 | KiB/r   1024 | KiB/w      0 | KiB/d      0 | MBr/s    2.4 | MBw/s    0.0 |              | inflt      0 | avq     0.91 | avio 10.5 ms |
DSK |          sdk | busy      2% | read      12 | write      0 |              | discrd     0 | KiB/r   1024 | KiB/w      0 | KiB/d      0 | MBr/s    2.4 | MBw/s    0.0 |              | inflt      0 | avq     0.87 | avio 9.58 ms |
DSK |          sdj | busy      2% | read      12 | write      0 |              | discrd     0 | KiB/r   1024 | KiB/w      0 | KiB/d      0 | MBr/s    2.4 | MBw/s    0.0 |              | inflt      0 | avq     0.91 | avio 9.17 ms |
DSK |          sdh | busy      2% | read      24 | write      0 |              | discrd     0 | KiB/r   1024 | KiB/w      0 | KiB/d      0 | MBr/s    4.8 | MBw/s    0.0 |              | inflt      0 | avq     0.75 | avio 4.00 ms |
DSK |          sdf | busy      2% | read      24 | write      0 |              | discrd     0 | KiB/r   1024 | KiB/w      0 | KiB/d      0 | MBr/s    4.8 | MBw/s    0.0 |              | inflt      0 | avq     0.71 | avio 3.17 ms |
DSK |          sde | busy      0% | read       0 | write     10 |              | discrd     0 | KiB/r      0 | KiB/w      4 | KiB/d      0 | MBr/s    0.0 | MBw/s    0.0 |              | inflt      0 | avq     1.50 | avio 0.40 ms |
NET | transport    | tcpi     145 | tcpo     285 |              | udpi       0 | udpo       0 | tcpao      0 | tcppo      0 |              | tcprs      0 | tcpie      0 | tcpor      0 | udpnp      0 |              | udpie      0 |
NET | network      | ipi      145 |              | ipo      284 | ipfrw      0 |              | deliv    145 |              |              |              |              |              | icmpi      0 | icmpo      0 |              |
NET | eno1      0% |              | pcki     151 | pcko     285 |              | sp 1000 Mbps | si   15 Kbps | so   68 Kbps |              | coll       0 | mlti       0 | erri       0 | erro       0 | drpi       0 | drpo       0 |
NET | vmbr0   ---- |              | pcki     151 | pcko     285 |              | sp    0 Mbps | si   12 Kbps | so   68 Kbps |              | coll       0 | mlti       0 | erri       0 | erro       0 | drpi       6 | drpo       0 |

root@pve2:~# zpool iostat raid5sas -v
                              capacity     operations     bandwidth
pool                        alloc   free   read  write   read  write
--------------------------  -----  -----  -----  -----  -----  -----
raid5sas                    3.28T  11.3T    716     68   434M  4.10M
  raidz1-0                  3.28T  11.3T    716     68   434M  4.10M
    wwn-0x5000c500587b5f5f      -      -    125     16   109M  1.02M
    wwn-0x5000c5008591cf77      -      -    165     17   109M  1.03M
    wwn-0x5000c5008591a4cb      -      -    167     16   109M  1.03M
    wwn-0x5000c50085880b2f      -      -    258     17   109M  1.03M
--------------------------  -----  -----  -----  -----  -----  -----

root@pve2:~# zpool status raid5sas
  pool: raid5sas
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 02:07:59 with 0 errors on Sun Aug  4 21:35:04 2024
config:

        NAME                        STATE     READ WRITE CKSUM
        raid5sas                    ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            wwn-0x5000c500587b5f5f  ONLINE       0     0    15
            wwn-0x5000c5008591cf77  ONLINE       0     0     1
            wwn-0x5000c5008591a4cb  ONLINE       0     0     0
            wwn-0x5000c50085880b2f  ONLINE       0     0     0

errors: No known data errors

(Status and Action was clear but i Was testing switch off disks and checking performance zpool)

Only writing degradation becouse reading is abut 600MB/s. All disk checket smarty -t long This controler SAS2008 with this same disk week ago working property in older machine with RHEL8. Now past update is dramatic. I was reading all internet examples abut abut degradarion wrote RAIDZ to ~20MB/s. Most of these cases were damaged disks. I also checked, despite smarty claiming that the disks were ok, removing a disk from the array if it was the culprit and working on the array without one disk. Unfortunately, this does not change anything, at most it causes a slower transfer by ~8MB/s.

From 48 hours of mopich research I found that the reason for the speed degradation is the drastic degradation of IOPS on all disks equally, which can be seen in the printout. Each disk is loaded in 99% and transfer per disk ~8MB/s. I did not know whether to write to the kernel or zfs group because I do not know exactly what is the reason for the degradation. I suspect the software or the kernel. For now, I am writing to you. At the same time, I also wrote to the proxmox group so that they know about the problem.

I have not yet checked how it will behave on "pure Debian". Theoretically, it is possible. Theoretically, it is also possible to put this SAS2008 into an old machine and check its performance - but does it make any sense? I would not like to write off this SAS2008 (TI) as a loss because it 100% satisfies my needs as a SAS HBA controller.

New fact. Problem no exist if create new pool raidz on this same controler on other disk SATA - transfer write is near full write 110MB/s per disk and 220MB/s per pool/ 100% usage IOPS pool raid5sas using - write through pool test_on_sata - write back When change all disk in pool raid5sas to write back i gat write speed pool about 100-150MB/s. Still much to low that maximal write ~450MB/s Anybady can help me resolve this problem?

Type | Version/Name PROXMOX | Distribution Name | Proxmox Distribution Version | 6.2.4 Kernel Version | 6.8.4-6.8.8 (both tested) Architecture | Amd (Ryzen 7) ZFS Version | 2.2.4

rincebrain commented 2 months ago

grep . /sys/module/{icp,zcommon,zfs}/parameters/{zfs_fletcher_4_impl,icp_aes_impl,icp_gcm_impl,zfs_vdev_raidz_impl} 2>/dev/null

I would speculate, offhand, that it's going to return something missing most of the options.

amotin commented 2 months ago

When change all disk in pool raid5sas to write back i gat write speed pool about 100-150MB/s. Still much to low that maximal write ~450MB/s

What are those write-through vs write-back? ZFS does not need disabled write caching on disks or controllers, or reliable write to media for each request. When it needs data to be stable, it explicitly requests cache flush. Make sure you have not enabled sync=always without a good reason.

batot1 commented 2 months ago

@rincebrain root@pve2:~# grep . /sys/module/{icp,zcommon,zfs}/parameters/{zfs_fletcher_4_impl,icp_aes_impl,icp_gcm_impl,zfs_vdev_raidz_impl} 2>/dev/null /sys/module/zfs/parameters/zfs_fletcher_4_impl:[fastest] scalar superscalar superscalar4 sse2 ssse3 avx2 /sys/module/zfs/parameters/icp_aes_impl:cycle [fastest] generic x86_64 aesni /sys/module/zfs/parameters/icp_gcm_impl:cycle [fastest] avx generic pclmulqdq /sys/module/zfs/parameters/zfs_vdev_raidz_impl:cycle [fastest] original scalar sse2 ssse3 avx2

@amotin raid5sas sync standard local raid5sas/test sync standard inherited from raid5sas raid5sas/test-zstd6 sync standard inherited from raid5sas raid5sas/video sync standard inherited from raid5sas test sync standard default test/gry sync standard default all pools/datasets - standard

pool test - it is new pool raidz on LSI SAS2008 (3x1TB).

rincebrain commented 2 months ago

Reading your original post, I have no idea what you're trying to describe, other than "slow".

What performance are you describing seeing before? What performance are you seeing now? What are the exact models of the disks you're seeing this on, with what zpool create command and test methods? What are the models of the disks that you're seeing this run fine on, and how are they attached?

If it works fine with some disks in the same machine on the same controller and not others, I would investigate what makes those disks different. How are they connected to the controller?