openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.45k stars 1.73k forks source link

High txg_sync latency causing write speed degradation #15824

Closed ImanolBarba closed 3 months ago

ImanolBarba commented 7 months ago

System information

Type Version/Name
Distribution Name CentOS Stream
Distribution Version 9
Kernel Version 5.14.0-402.el9.x86_64
Architecture x86_64
OpenZFS Version zfs-kmod-2.1.14-1

Describe the problem you're observing

I have a RAIDZ2 pool that is showing a strong write performance degradation when a very large file (2.2TB) is being copied into it. Usually, copying fairly large files (GBs in magnitude) yields about 1GB/s to 1.3GB/s with occasional dips to 600MB/s, a while after the big file starts copying it drops to about 20-30 MB/s as shown by zpool iostat -yv 2

pool                                     alloc   free   read  write   read  write
---------------------------------------------  -----  -----  -----  -----  -----  -----
bckpool                                        4.58T  53.6T      0    497      0  22.8M
  raidz2-0                                     4.58T  53.6T      0    497      0  22.8M
    scsi-SATA_ST8000NM000A-2KE_WSD0S88E            -      -      0     67      0  2.87M
    scsi-SATA_ST8000NM000A-2KE_WSD0WDK0            -      -      0     66      0  2.86M
    scsi-SATA_ST8000NM0055-1RM_ZA1HZZ0R            -      -      0     65      0  2.86M
    scsi-SATA_ST8000NM0055-1RM_ZA1GGQKP            -      -      0     67      0  2.86M
    scsi-SATA_ST8000NM000A-2KE_WSD63TJH            -      -      0     28      0  2.75M
    scsi-SATA_ST8000NM0055-1RM_ZA1ARAJR            -      -      0     66      0  2.85M
    scsi-SATA_ST8000NM000A-2KE_WKD2HR9F            -      -      0     67      0  2.85M
    scsi-SATA_ST8000NM0055-1RM_ZA1CA90N            -      -      0     66      0  2.86M
---------------------------------------  -----  -----  -----  -----  -----  -----

As compared to normal operation:

                                           capacity     operations     bandwidth 
pool                                     alloc   free   read  write   read  write
---------------------------------------  -----  -----  -----  -----  -----  -----
bckpool                                  4.10T  54.1T    123  16.6K  13.4M   935M
  raidz2-0                               4.10T  54.1T    123  16.6K  13.4M   935M
    scsi-SATA_ST8000NM000A-2KE_WSD0S88E      -      -     12  2.66K  1.69M   117M
    scsi-SATA_ST8000NM000A-2KE_WSD0WDK0      -      -     18  2.68K  1.68M   117M
    scsi-SATA_ST8000NM0055-1RM_ZA1HZZ0R      -      -     13  1.46K  1.65M   117M
    scsi-SATA_ST8000NM0055-1RM_ZA1GGQKP      -      -     17  1.58K  1.67M   117M
    scsi-SATA_ST8000NM000A-2KE_WSD63TJH      -      -     12  2.56K  1.68M   117M
    scsi-SATA_ST8000NM0055-1RM_ZA1ARAJR      -      -     18  1.54K  1.68M   118M
    scsi-SATA_ST8000NM000A-2KE_WKD2HR9F      -      -     16  2.54K  1.66M   117M
    scsi-SATA_ST8000NM0055-1RM_ZA1CA90N      -      -     11  1.58K  1.71M   117M
---------------------------------------  -----  -----  -----  -----  -----  -----
                                           capacity     operations     bandwidth 
pool                                     alloc   free   read  write   read  write
---------------------------------------  -----  -----  -----  -----  -----  -----
bckpool                                  4.10T  54.1T      1  18.8K  6.00K  1.23G
  raidz2-0                               4.10T  54.1T      1  18.8K  6.00K  1.23G
    scsi-SATA_ST8000NM000A-2KE_WSD0S88E      -      -      0  3.00K      0   157M
    scsi-SATA_ST8000NM000A-2KE_WSD0WDK0      -      -      0  3.06K      0   156M
    scsi-SATA_ST8000NM0055-1RM_ZA1HZZ0R      -      -      0  1.73K      0   158M
    scsi-SATA_ST8000NM0055-1RM_ZA1GGQKP      -      -      0  1.71K      0   158M
    scsi-SATA_ST8000NM000A-2KE_WSD63TJH      -      -      0  2.93K  2.00K   157M
    scsi-SATA_ST8000NM0055-1RM_ZA1ARAJR      -      -      0  1.72K  2.00K   158M
    scsi-SATA_ST8000NM000A-2KE_WKD2HR9F      -      -      0  2.89K  2.00K   157M
    scsi-SATA_ST8000NM0055-1RM_ZA1CA90N      -      -      0  1.75K      0   158M
---------------------------------------  -----  -----  -----  -----  -----  -----

The decrease in throughput is actually a decrease in IOPS

Nothing interesting on dmesg besides complains that the txg_sync task is taking more than 120s, which is expected given the issue.

Describe how to reproduce the problem

I usually copy files from another pool (which is showing no performance issues) to this pool. The files being copied are backups of differing sizes, some 100s of GB, a few ~2TB. Files are copied at a good rate until it hits the very large 2.2TB file, a while after copying it, write performance drops significantly. Read performance stays unaffected.

The issue persists until the host is rebooted and I start to copy smaller sized files again, exporting and importing the pool does not resolve the issue.

Include any warning/errors/backtraces from the system logs

Pool parameters:

$ zpool get all bckpool
NAME     PROPERTY                       VALUE                          SOURCE
bckpool  size                           58.2T                          -
bckpool  capacity                       7%                             -
bckpool  altroot                        -                              default
bckpool  health                         ONLINE                         -
bckpool  guid                           5555043564619793091            -
bckpool  version                        -                              default
bckpool  bootfs                         -                              default
bckpool  delegation                     on                             default
bckpool  autoreplace                    off                            default
bckpool  cachefile                      -                              default
bckpool  failmode                       wait                           default
bckpool  listsnapshots                  off                            default
bckpool  autoexpand                     off                            default
bckpool  dedupratio                     1.00x                          -
bckpool  free                           53.6T                          -
bckpool  allocated                      4.58T                          -
bckpool  readonly                       off                            -
bckpool  ashift                         12                             local
bckpool  comment                        -                              default
bckpool  expandsize                     -                              -
bckpool  freeing                        0                              -
bckpool  fragmentation                  0%                             -
bckpool  leaked                         0                              -
bckpool  multihost                      off                            default
bckpool  checkpoint                     -                              -
bckpool  load_guid                      17599775529180938087           -
bckpool  autotrim                       off                            default
bckpool  compatibility                  off                            default
bckpool  feature@async_destroy          enabled                        local
bckpool  feature@empty_bpobj            active                         local
bckpool  feature@lz4_compress           active                         local
bckpool  feature@multi_vdev_crash_dump  enabled                        local
bckpool  feature@spacemap_histogram     active                         local
bckpool  feature@enabled_txg            active                         local
bckpool  feature@hole_birth             active                         local
bckpool  feature@extensible_dataset     active                         local
bckpool  feature@embedded_data          active                         local
bckpool  feature@bookmarks              enabled                        local
bckpool  feature@filesystem_limits      enabled                        local
bckpool  feature@large_blocks           active                         local
bckpool  feature@large_dnode            active                         local
bckpool  feature@sha512                 enabled                        local
bckpool  feature@skein                  enabled                        local
bckpool  feature@edonr                  enabled                        local
bckpool  feature@userobj_accounting     active                         local
bckpool  feature@encryption             enabled                        local
bckpool  feature@project_quota          active                         local
bckpool  feature@device_removal         enabled                        local
bckpool  feature@obsolete_counts        enabled                        local
bckpool  feature@zpool_checkpoint       enabled                        local
bckpool  feature@spacemap_v2            active                         local
bckpool  feature@allocation_classes     enabled                        local
bckpool  feature@resilver_defer         enabled                        local
bckpool  feature@bookmark_v2            enabled                        local
bckpool  feature@redaction_bookmarks    enabled                        local
bckpool  feature@redacted_datasets      enabled                        local
bckpool  feature@bookmark_written       enabled                        local
bckpool  feature@log_spacemap           active                         local
bckpool  feature@livelist               enabled                        local
bckpool  feature@device_rebuild         enabled                        local
bckpool  feature@zstd_compress          active                         local
bckpool  feature@draid                  enabled                        local

Some other metrics of interest I gathered while the issue is present

txgs:

$ cat /proc/spl/kstat/zfs/bckpool/txgs
txg      birth            state ndirty       nread        nwritten     reads    writes   otime        qtime        wtime        stime       
28991    48521056576164   C     0            0            0            0        0        5120369798   31389        32380        65313       
28992    48526176945962   C     0            0            0            0        0        5119896831   6081         33553        65343       
28993    48531296842793   C     0            0            0            0        0        5120047283   24466        33212        66615       
28994    48536416890076   C     0            0            0            0        0        5119913132   19797        26099        50053       
28995    48541536803208   C     0            0            0            0        0        5120012237   25939        34043        66565       
28996    48546656815445   C     0            0            0            0        0        5119959809   25969        32451        64491       
28997    48551776775254   C     0            0            0            0        0        5119985628   24856        33042        65763       
28998    48556896760882   C     0            0            0            0        0        5119899566   25287        32381        65593       
28999    48562016660448   C     0            0            0            0        0        5119994313   27612        32161        65673       
29000    48567136654761   C     0            0            0            0        0        5119997620   31860        32712        64681       
29001    48572256652381   C     0            0            0            0        0        5119960962   20789        32851        66405       
29002    48577376613343   C     0            0            0            0        0        5119926286   25147        32381        66164       
29003    48582496539629   C     0            0            0            0        0        5119995456   31198        32091        65803       
29004    48587616535085   C     0            0            0            0        0        5119962123   27632        32140        64561       
29005    48592736497208   C     0            0            0            0        0        5120074865   19937        25088        51426       
29006    48597856572073   C     0            0            0            0        0        5119759854   26049        32691        65724       
29007    48602976331927   C     0            0            0            0        0        5120065998   30608        31639        66174       
29008    48608096397925   C     0            0            0            0        0        5119528501   20869        25328        50805       
29009    48613215926426   C     0            0            0            0        0        5120422987   27692        32922        66104       
29010    48618336349413   C     0            0            0            0        0        5119878186   5421         32471        66033       
29011    48623456227599   C     0            0            0            0        0        5120236809   19997        25388        51356       
29012    48628576464408   C     0            0            0            0        0        5119695994   27672        32060        65643       
29013    48633696160402   C     0            0            0            0        0        5120014442   6071         32982        64972       
29014    48638816174844   C     0            0            0            0        0        5119836327   24907        31950        64671       
29015    48643936011171   C     0            0            0            0        0        5120089463   6151         32732        66194       
29016    48649056100634   C     0            0            0            0        0        5119915906   27031        32321        64821       
29017    48654176016540   C     0            0            0            0        0        5120003091   31348        32341        65152       
29018    48659296019631   C     0            0            0            0        0        5119987531   6021         32651        64862       
29019    48664416007162   C     0            0            0            0        0        5119546013   3006         33983        48131       
29020    48669535553175   C     0            0            0            0        0        5120342036   30507        32210        65072       
29021    48674655895211   C     0            0            0            0        0        5120005815   25027        32300        65473       
29022    48679775901026   C     0            0            0            0        0        5119958597   30798        32000        63919       
29023    48684895859623   C     0            0            0            0        0        5119960450   6232         32901        64481       
29024    48690015820073   C     0            0            0            0        0        5120095584   15900        24986        49433       
29025    48695135915657   C     0            0            0            0        0        5119847398   26350        32641        65563       
29026    48700255763055   C     0            0            0            0        0        5119519544   25598        23714        49423       
29027    48705375282599   C     0            0            0            0        0        5120575173   20157        24857        51066       
29028    48710495857772   C     0            0            0            0        0        5119750105   24086        32460        64712       
29029    48715615607877   C     0            0            0            0        0        5120009533   30597        32962        65483       
29030    48720735617410   C     0            0            0            0        0        5119909505   28733        33423        64281       
29031    48725855526915   C     0            0            0            0        0        5120004162   6823         32621        66314       
29032    48730975531077   C     0            0            0            0        0        5119990376   20218        32150        64251       
29033    48736095521453   C     0            0            0            0        0        5119935414   26770        31980        64400       
29034    48741215456867   C     0            0            0            0        0        5119906168   20879        32782        66184       
29035    48746335363035   C     0            0            0            0        0        5120048616   27421        32050        65643       
29036    48751455411651   C     0            0            0            0        0        5119948998   5831         31900        65373       
29037    48756575360649   C     0            0            0            0        0        5119972604   27441        32812        63669       
29038    48761695333253   C     0            0            0            0        0        5119942837   30838        32611        64781       
29039    48766815276090   C     0            0            0            0        0        5119980438   24235        32912        66104       
29040    48771935256528   C     0            0            0            0        0        5120076748   19326        24827        49402       
29041    48777055333276   C     0            0            0            0        0        5119838912   24086        32711        64270       
29042    48782175172188   C     0            0            0            0        0        5119903935   24826        32161        65192       
29043    48787295076123   C     0            0            0            0        0        5120191784   19947        25528        50344       
29044    48792415267907   C     0            0            0            0        0        5119380372   4860         24526        50554       
29045    48797534648279   C     0            0            0            0        0        5120402840   20438        33273        65012       
29046    48802655051119   C     0            0            0            0        0        5119851606   31389        31650        64821       
29047    48807774902725   C     0            0            0            0        0        5120085515   32461        32120        65784       
29048    48812894988240   C     0            0            0            0        0        5119894346   5691         32721        64491       
29049    48818014882586   C     0            0            0            0        0        5120033908   6443         33502        64271       
29050    48823134916494   C     0            0            0            0        0        5119962975   25088        32671        66414       
29051    48828254879469   C     0            0            0            0        0        5119937448   27251        32511        65092       
29052    48833374816917   C     0            0            0            0        0        5119932447   30488        31729        65062       
29053    48838494749364   C     0            0            0            0        0        5120034389   24156        32751        67156       
29054    48843614783753   C     0            0            0            0        0        5119938079   23884        32622        65442       
29055    48848734721832   C     0            0            0            0        0        5119963876   30918        32381        66044       
29056    48853854685708   C     0            0            0            0        0        5120028999   15459        25779        50514       
29057    48858974714707   C     0            0            0            0        0        5119473257   4839         24807        50875       
29058    48864094187964   C     0            0            0            0        0        5120377963   26540        32771        64792       
29059    48869214565927   C     0            0            0            0        0        5120111434   25798        24826        51277       
29060    48874334677361   C     0            0            0            0        0        5119676117   24075        32501        66545       
29061    48879454353478   C     0            0            0            0        0        5120139606   20378        32782        66174       
29062    48884574493084   C     0            0            0            0        0        5120124127   18595        36178        50194       
29063    48889694617211   C     0            0            0            0        0        5119742462   25127        32681        66074       
29064    48894814359673   C     0            0            0            0        0        5119959438   26680        33222        65112       
29065    48899934319111   C     0            0            0            0        0        5120246927   25829        25708        50194       
29066    48905054566038   C     0            0            0            0        0        5119681678   26720        31870        64521       
29067    48910174247716   C     0            0            0            0        0        5120030822   23865        32561        64200       
29068    48915294278538   C     0            0            0            0        0        5119939111   30336        31810        65303       
29069    48920414217649   C     0            0            0            0        0        5119974326   26880        32782        66775       
29070    48925534191975   C     0            0            0            0        0        5119548728   18645        25308        49843       
29071    48930653740703   C     0            0            0            0        0        5120336035   28233        31429        64761       
29072    48935774076738   C     0            0            0            0        0        5120003050   20007        32952        66054       
29073    48940894079788   C     0            0            0            0        0        5119960751   24275        31900        63679       
29074    48946014040539   C     0            0            0            0        0        5119962253   27943        32250        62197       
29075    48951134002792   C     0            0            0            0        0        5119984305   5871         33102        66325       
29076    48956253987097   C     0            0            0            0        0        5119873748   21280        32281        63789       
29077    48961373860845   C     0            0            0            0        0        5120060368   21650        32962        64752       
29078    48966493921213   C     0            0            0            0        0        5119847258   28353        32331        189485      
29079    48971613768471   C     0            0            0            0        0        5120218464   16481        24926        51717       
29080    48976733986935   C     0            0            0            0        0        5119748723   23885        32641        64792       
29081    48981853735658   C     0            0            0            0        0        5120238472   23945        25207        50264       
29082    48986973974130   C     0            0            0            0        0        5119722624   6252         31930        64290       
29083    48992093696754   C     0            0            0            0        0        5119999464   6041         32872        65422       
29084    48997213696218   C     0            0            0            0        0        5119815218   6041         32721        64581       
29085    49002333511436   C     0            0            0            0        0        5120021424   30638        32941        64201       
29086    49007453532860   C     0            0            0            0        0        5120046642   20979        32421        64191       
29087    49012573579502   C     0            0            0            0        0        5119969467   31509        35096        65052       
29088    49017693548969   C     859062272    4096         657309696    1        13973    2110647081   9678         23163        23781857099 
29089    49019804196050   S     0            0            0            0        0        23781932900  3337         15068        0           
29090    49043586128950   O     0            0            0            0        0        0            0            0            0     

dmu_tx_assign

cat /proc/spl/kstat/zfs/bckpool/dmu_tx_assign
63 1 0x01 26 7072 44339804167023 49285118173894
name                            type data
1 ns                            4    0
2 ns                            4    0
4 ns                            4    0
8 ns                            4    0
16 ns                           4    0
32 ns                           4    0
64 ns                           4    32
128 ns                          4    4
256 ns                          4    5
512 ns                          4    5
1024 ns                         4    0
2048 ns                         4    0
4096 ns                         4    2
8192 ns                         4    2
16384 ns                        4    7
32768 ns                        4    17
65536 ns                        4    79
131072 ns                       4    1632
262144 ns                       4    3565
524288 ns                       4    2920
1048576 ns                      4    2877
2097152 ns                      4    3446
4194304 ns                      4    2019
8388608 ns                      4    32531
16777216 ns                     4    940
33554432 ns                     4    62

zpool iostat bckpool -r 2

This one is interesting because it shows the request size as 16K, when the pool has 1M recordsize (also had the same issue with the default recordsize of 128K)

bckpool       sync_read    sync_write    async_read    async_write      scrub         trim    
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      0      0      0      0      0      0      0      0      0      0
8K              0      0      0      0      0      0      0      0      0      0      0      0
16K             0      0      0      0      0      0    370      2      0      0      0      0
32K             0      0      0      0      0      0      0      4      0      0      0      0
64K             0      0      0      0      0      0      0      4      0      0      0      0
128K            0      0      0      0      0      0      0      0      0      0      0      0
256K            0      0      0      0      0      0      0      0      0      0      0      0
512K            0      0      0      0      0      0      0     15      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0
----------------------------------------------------------------------------------------------

During regular operation (same pool):

bckpool       sync_read    sync_write    async_read    async_write      scrub         trim    
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      0      0      0      0  2.13K      0      0      0      0      0
8K              0      0      0      0      0      0  3.81K    259      0      0      0      0
16K             0      0      0      0      0      0  1.92K    587      0      0      0      0
32K             0      0      0      0      0      0      0    775      0      0      0      0
64K             0      0      0      0      0      0      0    678      0      0      0      0
128K            0      0      0      0      0      0      0    600      0      0      0      0
256K            0      0      0      0      0      0      0    545      0      0      0      0
512K            0      0      0      0      0      0      0    221      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0
----------------------------------------------------------------------------------------------

zpool iostat bckpool -q 2

              capacity     operations     bandwidth    syncq_read    syncq_write   asyncq_read  asyncq_write   scrubq_read   trimq_write
pool        alloc   free   read  write   read  write   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
bckpool     4.99T  53.2T      0     40  40.4K  1.93M      0      0      0      0      0      0      0     10      0      0      0      0
bckpool     4.99T  53.2T      0    601      0  23.3M      0      0      0      0      0      0     14     10      0      0      0      0
bckpool     4.99T  53.2T      0    473      0  21.9M      0      0      0      0      0      0      0     10      0      0      0      0
bckpool     4.99T  53.2T      0    440      0  23.2M      0      0      0      0      0      0      0     10      0      0      0      0
bckpool     4.99T  53.2T      0    387      0  22.4M      0      0      0      0      0      0      0     10      0      0      0      0

zpool iostat bckpool -w 2

bckpool      total_wait     disk_wait    syncq_wait    asyncq_wait
latency      read  write   read  write   read  write   read  write  scrub   trim
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
1ns             0      0      0      0      0      0      0      0      0      0
3ns             0      0      0      0      0      0      0      0      0      0
7ns             0      0      0      0      0      0      0      0      0      0
15ns            0      0      0      0      0      0      0      0      0      0
31ns            0      0      0      0      0      0      0      0      0      0
63ns            0      0      0      0      0      0      0      0      0      0
127ns           0      0      0      0      0      0      0      0      0      0
255ns           0      0      0      0      0      0      0     39      0      0
511ns           0      0      0      0      0      0      0    139      0      0
1us             0      0      0      0      0      0      0    125      0      0
2us             0      0      0      0      0      0      0     72      0      0
4us             0      0      0      0      0      0      0     29      0      0
8us             0      0      0      0      0      0      0      6      0      0
16us            0      0      0      0      0      0      0      0      0      0
32us            0      0      0      0      0      0      0      0      0      0
65us            0      0      0      0      0      0      0      0      0      0
131us           0      0      0      0      0      0      0      0      0      0
262us           0      0      0      0      0      0      0      0      0      0
524us           0     36      0     36      0      0      0      3      0      0
1ms             0     58      0     58      0      0      0     24      0      0
2ms             0    164      0    187      0      0      0     27      0      0
4ms             0    183      0    174      0      0      0      8      0      0
8ms             0     18      0      5      0      0      0      0      0      0
16ms            0      0      0      0      0      0      0      0      0      0
33ms            0      0      0      0      0      0      0      3      0      0
67ms            0      0      0      0      0      0      0      3      0      0
134ms           0      1      0      1      0      0      0      2      0      0
268ms           0      4      0      7      0      0      0      0      0      0
536ms           0     17      0     15      0      0      0      0      0      0
1s              0      2      0      2      0      0      0      0      0      0
2s              0      0      0      0      0      0      0      0      0      0
4s              0      0      0      0      0      0      0      0      0      0
8s              0      0      0      0      0      0      0      0      0      0
17s             0      0      0      0      0      0      0      0      0      0
34s             0      0      0      0      0      0      0      0      0      0
68s             0      0      0      0      0      0      0      0      0      0
137s            0      0      0      0      0      0      0      0      0      0
--------------------------------------------------------------------------------

zpool iostat bckpool -l 2

              capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool        alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
bckpool     4.99T  53.2T      0     47  39.7K  2.26M   13ms   25ms    9ms   24ms    1ms  367ns  297ns    1ms   28ms      -
bckpool     4.99T  53.2T      0    443      0  21.1M      -   29ms      -   27ms      -      -      -    1ms      -      -
bckpool     4.99T  53.2T      0    314      0  20.1M      -   35ms      -   35ms      -      -      -  900us      -      -
bckpool     4.99T  53.2T      0    409      0  20.8M      -   31ms      -   30ms      -      -      -  917us      -      -
bckpool     4.99T  53.2T      0    400      0  20.5M      -   29ms      -   28ms      -      -      -    1ms      -      -
jxdking commented 7 months ago

Are you considering adding a slog device? Try to add one and see if it improves.

rrevans commented 7 months ago

What tool/setup are you using to copy the 2.2T file? Any network services involved on either side?

Do you see high usage/pinned CPUs? If so, does sudo perf top give you any clues?

Does this happen during or after the copy? If during, at what point does performance degrade? Is it all at once or little by little?

It looks a bit like something in sync context is doing computation work proportional to the filesize based on the long stime but modest write bandwidth in txgs. As the file grows maybe larger more work is done slowing it down?

ImanolBarba commented 7 months ago

What tool/setup are you using to copy the 2.2T file? Any network services involved on either side?

No, just cp -av

Do you see high usage/pinned CPUs? If so, does sudo perf top give you any clues?

No more than 30% sys cpu occasionally, likely due to compression

Does this happen during or after the copy? If during, at what point does performance degrade? Is it all at once or little by little?

During, after copying about 4.5 TB

ImanolBarba commented 7 months ago

I did add an slog, but since the writes are async it did nothing

jxdking commented 7 months ago

What if you test it with "fio" tool? Also make sure you don't use up all the memory when issue happens.

ImanolBarba commented 7 months ago

fio shows the about the same throughput as zpool status

(Total) memory usage never goes beyond 70G, which is expected since ARC is supposed to take half of those 128G

ImanolBarba commented 3 months ago

So I forgot to update this issue, but it turned out that I had bad RAM (started getting errors reported on files randomly and confirmed bad RAM with memtest).

After replacing the RAM, I don't have the issue anymore, why did it happen? I have no clue, but with the same setup I have no performance issues at all