openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.72k forks source link

compression=gzip on FreeBSD leaks memory #10225

Closed bra-fsn closed 4 years ago

bra-fsn commented 4 years ago

System information

Type Version/Name
Distribution Name FreeBSD
Distribution Version 12.1
Linux Kernel 12.1
Architecture amd64
ZFS Version openzfs port 2020031600 (zfsonfreebsd@152cc960a)
SPL Version -

Describe the problem you're observing

Detailed description is here: https://openzfs.topicbox.com/groups/developer/T10533b84f9e1cfc5 I think the optimization work is now done/merged (https://github.com/openzfs/zfs/pull/9181) and my openzfs version contains it. I can acknowledge that it makes the machine more stable. It can now survive for around 1.5 days with 192G RAM.

Memory usage is like this: image

top shows ATM:

last pid: 86579;  load averages:  4.14,  7.36,  8.17                                                                                                                                       up 1+13:10:05  22:44:46
886 processes: 2 running, 881 sleeping, 3 zombie
CPU:  4.0% user,  0.0% nice, 16.7% system,  0.2% interrupt, 79.1% idle
Mem: 414M Active, 942M Inact, 2967M Laundry, 179G Wired, 181M Buf, 3550M Free
ARC: 16G Total, 5911M MFU, 8842M MRU, 65M Anon, 194M Header, 902M Other
     3290M Compressed, 12G Uncompressed, 3.79:1 Ratio
Swap: 64G Total, 9990M Used, 54G Free, 15% Inuse

vmstat -z output:

ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP

UMA Kegs:               248,      0,     261,       9,     261,   0,   0
UMA Zones:             3536,      0,     282,       0,     282,   0,   0
UMA Slabs:               80,      0, 8617831,    1919,259569995,   0,   0
UMA Hash:               256,      0,      50,      25,     112,   0,   0
4 Bucket:                32,      0,     918,    4957,348868516,   0,   0
6 Bucket:                48,      0,     208,    5436,132384375,   8,   0
8 Bucket:                64,      0,     167,    4235,243579615,  50,   0
12 Bucket:               96,      0,     170,    2331,176723977,  18,   0
16 Bucket:              128,      0,     341,    2542,93606631,   1,   0
32 Bucket:              256,      0,     411,    1134,75156260,  61,   0
64 Bucket:              512,      0,     239,     489,46971345,251367,   0
128 Bucket:            1024,      0,     278,     170,42127174,55432,   0
256 Bucket:            2048,      0,     604,      80,63072673,1010852,   0
vmem:                  1856,      0,       5,       1,       5,   0,   0
vmem btag:               56,      0, 6465919,    5660,11770265,46749,   0
VM OBJECT:              256,      0,  416662,    8168,30557430,   0,   0
RADIX NODE:             144,      0, 3125730,   15477,125988377,   0,   0
MAP:                    240,      0,       3,      61,       3,   0,   0
KMAP ENTRY:             120,      0,      17,     214,      28,   0,   0
MAP ENTRY:              120,      0,   64926,    4473,108762977,   0,   0
VMSPACE:               2560,      0,     884,     147, 1176791,   0,   0
fakepg:                 104,      0,      15,    1885,  170599,   0,   0
64 pcpu:                  8,      0,    4037,    3131,    4601,   0,   0
mt_stats_zone:           64,      0,     417,     223,     417,   0,   0
mt_zone:                 24,      0,     417,     752,     417,   0,   0
16:                      16,      0,   43451,    6247,132299620,   0,   0
32:                      32,      0,  869246, 1172754,8800350323,   0,   0
64:                      64,      0, 5395695,    5869,579170281,   0,   0
128:                    128,      0, 5379328,   47377,2424502402,   0,   0
256:                    256,      0,  101028,   62157,1483751004,   0,   0
512:                    512,      0,  551443,   20501,277298099,   0,   0
1024:                  1024,      0,   74660,     224,14771458,   0,   0
2048:                  2048,      0,    7411,     191,1088610130,   0,   0
4096:                  4096,      0,  455982,       6, 7890277,   0,   0
8192:                  8192,      0,    1060,       5, 1808649,   0,   0
16384:                16384,      0,     213,       1, 4114774,   0,   0
32768:                32768,      0, 4852158,       1, 5099459,   0,   0
65536:                65536,      0,      62,      10, 7523850,   0,   0
SLEEPQUEUE:              80,      0,   24785,    1379,   24785,   0,   0
Files:                   80,      0,    5965,    3585,115156016,   0,   0
filedesc0:             1104,      0,     939,      36, 1176970,   0,   0
rangeset pctrie nodes:    144,      0,       0,       0,       0,   0,   0
rl_entry:                40,      0,    1588,    4512,    1588,   0,   0
TURNSTILE:              136,      0,   24785,     895,   24785,   0,   0
umtx pi:                 96,      0,       0,       0,       0,   0,   0
umtx_shm:                88,      0,       0,       0,       0,   0,   0
MAC labels:              40,      0,       0,       0,       0,   0,   0
PROC:                  1328,      0,     941,     235, 1176969,   0,   0
THREAD:                1824,      0,   24681,     103, 1015765,   0,   0
cpuset:                 104,      0,      51,     786,      51,   0,   0
domainset:               40,      0,       0,       0,       0,   0,   0
audit_record:          1280,      0,       0,       0,       0,   0,   0
mbuf_packet:            256, 3355455,       7,    5562,34493574,   0,   0
mbuf:                   256, 3355455,    6304,   11467,3424148289,   0,   0
mbuf_cluster:          2048, 524288,   11718,    1074,1488742770,   0,   0
mbuf_jumbo_page:       4096, 262144,      13,     690,274680011,   0,   0
mbuf_jumbo_9k:         9216,  77672,       0,       0,       0,   0,   0
mbuf_jumbo_16k:       16384,  43690,       0,       0,       0,   0,   0
epoch_record pcpu:      256,      0,       4,      12,       4,   0,   0
FPU_save_area:          832,      0,       0,       0,       0,   0,   0
DMAR_MAP_ENTRY:         120,      0,       0,       0,       0,   0,   0
ttyinq:                 160,      0,     180,     145,     495,   0,   0
ttyoutq:                256,      0,      95,     115,     260,   0,   0
g_bio:                  376,      0,     130,     850,1918206632,   0,   0
nvme_request:           128,      0,       0,       0,       0,   0,   0
cryptop:                128,      0,       0,       0,       0,   0,   0
cryptodesc:             120,      0,       0,       0,       0,   0,   0
crypto_session:          24,      0,       0,       0,       0,   0,   0
vtnet_tx_hdr:            24,      0,       0,       0,       0,   0,   0
VNODE:                  480,      0,  528933,     643, 7420915,   0,   0
VNODEPOLL:              120,      0,       0,       0,       0,   0,   0
BUF TRIE:               144,      0,    2679,  103269,  967398,   0,   0
NAMEI:                 1024,      0,       6,     206,96990051,   0,   0
rentr:                   24,      0,       0,       0,       1,   0,   0
S VFS Cache:            108,      0,  527396,    3029,12371447,   0,   0
STS VFS Cache:          148,      0,    1697,    1917, 2363696,   0,   0
L VFS Cache:            328,      0,       0,       0,       0,   0,   0
LTS VFS Cache:          368,      0,       3,      27,     834,   0,   0
NCLNODE:                592,      0,    1961,     409,    3462,   0,   0
DIRHASH:               1024,      0,       0,       0,    1070,   0,   0
pipe:                   760,      0,     221,     104,  759198,   0,   0
procdesc:               136,      0,       0,       0,       0,   0,   0
Mountpoints:           2744,      0,      57,       5,      58,   0,   0
AIO:                    208,      0,     683,     229,     683,   0,   0
AIOP:                    32,      0,       4,    2621,     122,   0,   0
AIOCB:                  752,      0,       0,     395,    8207,   0,   0
AIOLIO:                 280,      0,       0,       0,       0,   0,   0
ksiginfo:               112,      0,    1138,    1767,    9110,   0,   0
itimer:                 352,      0,       0,       0,       0,   0,   0
KNOTE:                  160,      0,    4499,    1526,890375784,   0,   0
socket:                 872, 6286248,    3942,     514,25239009,   0,   0
unpcb:                  256, 6286260,    2233,     182,   12230,   0,   0
ipq:                     56,  16401,       0,       0,     465,   0,   0
udp_inpcb:              488, 6286248,       7,     441,12512797,   0,   0
udpcb:                   32, 6286250,       7,    3368,12512797,   0,   0
tcp_inpcb:              488, 6286248,    4053,     683,12713971,   0,   0
tcpcb:                  976, 6286248,    1680,     252,12713971,   0,   0
tcptw:                   88,  27810,    2373,    2712, 6576691,   0,   0
syncache:               168,  15364,       0,     805, 6124050,   0,   0
hostcache:               96,  15375,      55,     150,      64,   0,   0
sackhole:                32,      0,       0,       0,      12,   0,   0
tfo:                      4,      0,       0,       0,       0,   0,   0
tfo_ccache_entries:      80,      0,       0,       0,       0,   0,   0
tcpreass:                48,  32785,       0,       0,       0,   0,   0
tcp_log:                400, 1000000,       0,       0,       0,   0,   0
tcp_log_bucket:         144,      0,       0,       0,       0,   0,   0
tcp_log_node:           120,      0,       0,       0,       0,   0,   0
udplite_inpcb:          488, 6286248,       0,       0,       0,   0,   0
ripcb:                  488, 6286248,       0,       0,       0,   0,   0
IPsec SA lft_c:          16,      0,       0,       0,       0,   0,   0
rtentry:                208,      0,      25,     165,      97,   0,   0
selfd:                   64,      0,     587,    4869,47991055,   0,   0
swpctrie:               144, 24512571,   20074,     959,   97651,   0,   0
swblk:                  136, 24512569,  199345,    1393,  783000,   0,   0
md0:                    512,      0,   20000,      16,   20000,   0,   0
FFS inode:              160,      0,     652,     448,  279939,   0,   0
FFS1 dinode:            128,      0,       0,       0,       0,   0,   0
FFS2 dinode:            256,      0,     652,     293,  279939,   0,   0
md1:                    512,      0,   70000,       8,   70000,   0,   0
TMPFS dirent:            64,      0,     169,    2807,     324,   0,   0
TMPFS node:             232,      0,     172,     695,     327,   0,   0
pf mtags:                48,      0,       0,       0,       4,   0,   0
pf tags:                104,      0,       0,       0,       0,   0,   0
pf states:              296, 100009,   12552,     903,24448871,   0,   0
pf state keys:           88,      0,   12552,    3288,24448871,   0,   0
pf source nodes:        136,  10005,       0,       0,       0,   0,   0
pf table entries:       216, 200016,      12,      24,      12,   0,   0
pf frags:               112,      0,       0,       0,       0,   0,   0
pf frag entries:         40,   5000,       0,       0,       0,   0,   0
pf state scrubs:         40,      0,       0,       0,       0,   0,   0
taskq_zone:             168,      0,      58,    1602,35950956,   0,   0
zfs_btree_leaf_cache:   4096,      0, 1830288,      23,39570799,   0,   0
ddt_cache:            24840,      0,     598,       0,    1196,   0,   0
ddt_entry_cache:        392,      0,       0,       0,       0,   0,   0
zio_cache:             1208,      0,   51584,   11887,2239439052,   0,   0
zio_link_cache:          48,      0,   50238,   54425,2270673853,   0,   0
zio_buf_512:            512,      0,    2542,    1098,22153182,   0,   0
zio_data_buf_512:       512,      0,     149,     555,66453479,   0,   0
zio_buf_1024:          1024,      0,   13031,    1437,141948455,   0,   0
zio_data_buf_1024:     1024,      0,     290,     226,33214428,   0,   0
zio_buf_1536:          1536,      0,   35497,    1897,201562187,   0,   0
zio_data_buf_1536:     1536,      0,     532,     370,70252243,   0,   0
zio_buf_2048:          2048,      0,   66304,    1748,170484549,   0,   0
zio_data_buf_2048:     2048,      0,     310,     220,71469730,   0,   0
zio_buf_2560:          2560,      0,  126561,      79,99658995,   0,   0
zio_data_buf_2560:     2560,      0,     299,      32,18760078,   0,   0
zio_buf_3072:          3072,      0,  152895,      62,66962235,   0,   0
zio_data_buf_3072:     3072,      0,     175,      30,24167373,   0,   0
zio_buf_3584:          3584,      0,   20958,      16, 7147584,   0,   0
zio_data_buf_3584:     3584,      0,     134,      24,16786399,   0,   0
zio_buf_4096:          4096,      0,   48215,      26,55070751,   0,   0
zio_data_buf_4096:     4096,      0,      68,      24,21524272,   0,   0
zio_buf_5120:          5120,      0,       0,       8, 3947600,   0,   0
zio_data_buf_5120:     5120,      0,      39,      21,15339812,   0,   0
zio_buf_6144:          6144,      0,       5,       8, 3929197,   0,   0
zio_data_buf_6144:     6144,      0,      41,      21,10293930,   0,   0
zio_buf_7168:          7168,      0,      10,       4, 2751369,   0,   0
zio_data_buf_7168:     7168,      0,      19,       8, 6420565,   0,   0
zio_buf_8192:          8192,      0,      12,      14,19070868,   0,   0
zio_data_buf_8192:     8192,      0,      10,       4, 5313206,   0,   0
zio_buf_10240:        10240,      0,      15,      26,14673999,   0,   0
zio_data_buf_10240:   10240,      0,      22,      10, 7909723,   0,   0
zio_buf_12288:        12288,      0,       3,      10, 5319209,   0,   0
zio_data_buf_12288:   12288,      0,      29,      10, 5150008,   0,   0
zio_buf_14336:        14336,      0,       0,      12, 2313878,   0,   0
zio_data_buf_14336:   14336,      0,      23,       4, 3955290,   0,   0
zio_buf_16384:        16384,      0,  550032,      69,705910394,   0,   0
zio_data_buf_16384:   16384,      0,      37,      11, 2490958,   0,   0
zio_buf_20480:        20480,      0,       0,      11, 3737866,   0,   0
zio_data_buf_20480:   20480,      0,      40,      11, 2563902,   0,   0
zio_buf_24576:        24576,      0,       1,       5, 3488582,   0,   0
zio_data_buf_24576:   24576,      0,      29,       5, 1518433,   0,   0
zio_buf_28672:        28672,      0,       0,      11, 2721423,   0,   0
zio_data_buf_28672:   28672,      0,      19,       0,  908292,   0,   0
zio_buf_32768:        32768,      0,       1,      10, 2585784,   0,   0
zio_data_buf_32768:   32768,      0,      30,       0,  685514,   0,   0
zio_buf_40960:        40960,      0,       6,       9, 3601541,   0,   0
zio_data_buf_40960:   40960,      0,      42,       1, 1173947,   0,   0
zio_buf_49152:        49152,      0,       0,       9, 1849885,   0,   0
zio_data_buf_49152:   49152,      0,      24,       1, 1021301,   0,   0
zio_buf_57344:        57344,      0,       6,       3, 1490565,   0,   0
zio_data_buf_57344:   57344,      0,      26,       2,  864622,   0,   0
zio_buf_65536:        65536,      0,      26,       5, 1259640,   0,   0
zio_data_buf_65536:   65536,      0,      24,       1, 1080930,   0,   0
zio_buf_81920:        81920,      0,       1,       4, 1347879,   0,   0
zio_data_buf_81920:   81920,      0,      33,       2, 2739035,   0,   0
zio_buf_98304:        98304,      0,       1,       3,  949398,   0,   0
zio_data_buf_98304:   98304,      0,      24,      12, 9262391,   0,   0
zio_buf_114688:      114688,      0,       2,       4, 1356065,   0,   0
zio_data_buf_114688: 114688,      0,      96,      11,18391062,   0,   0
zio_buf_131072:      131072,      0,   22660,      33,124129326,   0,   0
zio_data_buf_131072: 131072,      0,     126,       0,  642863,   0,   0
zio_buf_163840:      163840,      0,       0,       3,  760570,   0,   0
zio_data_buf_163840: 163840,      0,      17,       0,  496578,   0,   0
zio_buf_196608:      196608,      0,       0,       3,  603070,   0,   0
zio_data_buf_196608: 196608,      0,       9,       0,  423227,   0,   0
zio_buf_229376:      229376,      0,       0,       3, 1081866,   0,   0
zio_data_buf_229376: 229376,      0,       3,       3,  356445,   0,   0
zio_buf_262144:      262144,      0,       0,       1,  233554,   0,   0
zio_data_buf_262144: 262144,      0,       3,       2,  261629,   0,   0
zio_buf_327680:      327680,      0,       0,       0,  522539,   0,   0
zio_data_buf_327680: 327680,      0,       1,       0,  352535,   0,   0
zio_buf_393216:      393216,      0,       0,       0,  184306,   0,   0
zio_data_buf_393216: 393216,      0,       2,       2,  259501,   0,   0
zio_buf_458752:      458752,      0,       0,       0,  204298,   0,   0
zio_data_buf_458752: 458752,      0,       1,       0,  235566,   0,   0
zio_buf_524288:      524288,      0,       0,       0,  115171,   0,   0
zio_data_buf_524288: 524288,      0,       3,       0,  202161,   0,   0
zio_buf_655360:      655360,      0,       0,       1,  129732,   0,   0
zio_data_buf_655360: 655360,      0,       2,       0,  361461,   0,   0
zio_buf_786432:      786432,      0,       0,       0,  150016,   0,   0
zio_data_buf_786432: 786432,      0,       0,       1, 1267775,   0,   0
zio_buf_917504:      917504,      0,       0,       0,  254924,   0,   0
zio_data_buf_917504: 917504,      0,       0,       0, 1555638,   0,   0
zio_buf_1048576:     1048576,      0,       0,       7,  497193,   0,   0
zio_data_buf_1048576: 1048576,      0,      67,       1,   38101,   0,   0
zio_buf_1310720:     1310720,      0,       0,       0,       0,   0,   0
zio_data_buf_1310720: 1310720,      0,       0,       0,       0,   0,   0
zio_buf_1572864:     1572864,      0,       0,       0,       0,   0,   0
zio_data_buf_1572864: 1572864,      0,       0,       0,       0,   0,   0
zio_buf_1835008:     1835008,      0,       0,       0,       0,   0,   0
zio_data_buf_1835008: 1835008,      0,       0,       0,       0,   0,   0
zio_buf_2097152:     2097152,      0,       0,       0,       0,   0,   0
zio_data_buf_2097152: 2097152,      0,       0,       0,       0,   0,   0
zio_buf_2621440:     2621440,      0,       0,       0,       0,   0,   0
zio_data_buf_2621440: 2621440,      0,       0,       0,       0,   0,   0
zio_buf_3145728:     3145728,      0,       0,       0,       0,   0,   0
zio_data_buf_3145728: 3145728,      0,       0,       0,       0,   0,   0
zio_buf_3670016:     3670016,      0,       0,       0,       0,   0,   0
zio_data_buf_3670016: 3670016,      0,       0,       0,       0,   0,   0
zio_buf_4194304:     4194304,      0,       0,       0,       0,   0,   0
zio_data_buf_4194304: 4194304,      0,       0,       0,       0,   0,   0
zio_buf_5242880:     5242880,      0,       0,       0,       0,   0,   0
zio_data_buf_5242880: 5242880,      0,       0,       0,       0,   0,   0
zio_buf_6291456:     6291456,      0,       0,       0,       0,   0,   0
zio_data_buf_6291456: 6291456,      0,       0,       0,       0,   0,   0
zio_buf_7340032:     7340032,      0,       0,       0,       0,   0,   0
zio_data_buf_7340032: 7340032,      0,       0,       0,       0,   0,   0
zio_buf_8388608:     8388608,      0,       0,       0,       0,   0,   0
zio_data_buf_8388608: 8388608,      0,       0,       0,       0,   0,   0
zio_buf_10485760:    10485760,      0,       0,       0,       0,   0,   0
zio_data_buf_10485760: 10485760,      0,       0,       0,       0,   0,   0
zio_buf_12582912:    12582912,      0,       0,       0,       0,   0,   0
zio_data_buf_12582912: 12582912,      0,       0,       0,       0,   0,   0
zio_buf_14680064:    14680064,      0,       0,       0,       0,   0,   0
zio_data_buf_14680064: 14680064,      0,       0,       0,       0,   0,   0
zio_buf_16777216:    16777216,      0,       0,       0,       0,   0,   0
zio_data_buf_16777216: 16777216,      0,       0,       0,       0,   0,   0
lz4_cache:            16384,      0,       0,      39,70003424,   0,   0
abd_chunk:             4096,      0,  590770,     104,119431594,   0,   0
sa_cache:               264,      0,  526074,     696, 7136987,   0,   0
dnode_t:                808,      0,  547787,   19003, 7135541,   0,   0
arc_buf_hdr_t_full:     256,      0,  658020,  488730,6204894735,   0,   0
arc_buf_hdr_t_full_crypt:    320,      0,       0,       0,       0,   0,   0
arc_buf_hdr_t_l2only:     96,      0,       0,       0,       0,   0,   0
arc_buf_t:               64,      0,  575073,    6611,781337311,   0,   0
dmu_buf_impl_t:         296,      0, 1118052,    3432,34222991,   0,   0
zil_lwb_cache:          360,      0,    9497,     623,20108856,   0,   0
zil_zcw_cache:           80,      0,       5,    2145,24216697,   0,   0
sio_cache_0:            136,      0, 3244098,  993382,189500908,   0,   0
sio_cache_1:            152,      0,  541026,  110664,38237956,   0,   0
sio_cache_2:            168,      0,       0,     322,30938619,   0,   0
zfs_znode_cache:        472,      0,  526074,     670, 7136987,   0,   0

@ahrens, @pcd1193182 do you have any further ideas on how to improve this situation? I've already started rewriting the pools with ashift=12, but it takes ages to complete...

Describe how to reproduce the problem

See the above mailing list thread. Basically:

(some disks are rewritten on this machine)

# zpool list
NAME     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
disk0    444G  14.9G   429G        -         -    16%     3%  1.00x  ONLINE  -
disk1    444G  15.0G   429G        -         -    14%     3%  1.00x  ONLINE  -
disk10  3.62T  2.53T  1.09T        -         -    32%    69%  1.00x  ONLINE  -
disk11  3.62T  2.54T  1.09T        -         -    33%    69%  1.00x  ONLINE  -
disk12  3.62T  2.53T  1.09T        -         -    32%    69%  1.00x  ONLINE  -
disk13  3.62T  2.23T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk14  3.62T  1.94T  1.68T        -         -    69%    53%  1.00x  ONLINE  -
disk15  3.62T  2.24T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk16  3.62T  2.22T  1.40T        -         -    73%    61%  1.00x  ONLINE  -
disk17  3.62T  2.25T  1.38T        -         -    73%    62%  1.00x  ONLINE  -
disk18  3.62T  2.42T  1.20T        -         -     7%    66%  1.00x  ONLINE  -
disk19  3.62T  2.24T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk2   3.62T   880G  2.77T        -         -    13%    23%  1.00x  ONLINE  -
disk20  3.62T  2.24T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk21  3.62T  2.24T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk22  3.62T  2.23T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk23  3.62T  2.24T  1.38T        -         -    73%    61%  1.00x  ONLINE  -
disk24  3.62T  2.23T  1.40T        -         -    73%    61%  1.00x  ONLINE  -
disk25  3.62T  2.23T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk26  3.62T  1.95T  1.67T        -         -    69%    53%  1.00x  ONLINE  -
disk27  3.62T  2.24T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk28  3.62T  2.24T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk29  3.62T  2.23T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk3   3.62T  2.49T  1.14T        -         -    34%    68%  1.00x  ONLINE  -
disk30  3.62T  2.24T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk31  3.62T  1.32T  2.30T        -         -    58%    36%  1.00x  ONLINE  -
disk32  3.62T  2.24T  1.38T        -         -    73%    61%  1.00x  ONLINE  -
disk33  3.62T  2.24T  1.38T        -         -    73%    61%  1.00x  ONLINE  -
disk34  3.62T  2.24T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk35  3.62T  2.23T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk36  3.62T  2.24T  1.38T        -         -    73%    61%  1.00x  ONLINE  -
disk37  3.62T  2.23T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk38  3.62T  2.24T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk39  3.62T  2.24T  1.38T        -         -    73%    61%  1.00x  ONLINE  -
disk4   3.62T  2.48T  1.15T        -         -    34%    68%  1.00x  ONLINE  -
disk40  3.62T  2.23T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk41  3.62T  2.19T  1.43T        -         -    72%    60%  1.00x  ONLINE  -
disk42  3.62T  2.23T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk43  3.62T  2.23T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk44  3.62T  2.24T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk45  3.62T  2.24T  1.39T        -         -    73%    61%  1.00x  ONLINE  -
disk5   3.62T  2.48T  1.15T        -         -    33%    68%  1.00x  ONLINE  -
disk6   3.62T  2.48T  1.15T        -         -    34%    68%  1.00x  ONLINE  -
disk7   3.62T  2.47T  1.15T        -         -    34%    68%  1.00x  ONLINE  -
disk8   3.62T  2.48T  1.15T        -         -    34%    68%  1.00x  ONLINE  -
disk9   3.62T  2.47T  1.15T        -         -    33%    68%  1.00x  ONLINE  -
ghost commented 4 years ago

Please try with the latest version of the ports (202004150x). We were not initializing the arc free target in earlier versions.

ahrens commented 4 years ago

Your "how to reproduce" steps imply that the problem is with fragmentation causing memory use. The most plausible route for that is loaded metaslabs, which consume from the zfs_btree_leaf_cache. If I'm reading correctly, that is using 7GB of RAM, which is considerable, but much less than the 179GB of "wired" memory that you're trying to account for.

Am I reading correctly that you have 45 separate storage pools? That should work, but it is not a use case that has received much scrutiny. Maybe you're hitting some unknown suboptimal behavior due to having so many storage pools?

bra-fsn commented 4 years ago

Please try with the latest version of the ports (202004150x). We were not initializing the arc free target in earlier versions.

Sorry, haven't noticed this came out. Trying, thanks! I will report back with the findings. BTW, according to the outputs, ARC oversizing doesn't seem to be the problem here, no?

bra-fsn commented 4 years ago

Your "how to reproduce" steps imply that the problem is with fragmentation causing memory use. The most plausible route for that is loaded metaslabs, which consume from the zfs_btree_leaf_cache. If I'm reading correctly, that is using 7GB of RAM, which is considerable, but much less than the 179GB of "wired" memory that you're trying to account for.

Hm, sorry, after that long thread I put a bookmark into my head with that I should watch out for the related work and haven't checked the outputs. Well, I'm not even sure the original conclusion is (still) standing. I'm taking a look at openzfs @ 2020041502 and will let you know what happens.

Am I reading correctly that you have 45 separate storage pools? That should work, but it is not a use case that has received much scrutiny. Maybe you're hitting some unknown suboptimal behavior due to having so many storage pools?

Yes, that's what I have. I have redundancy between the hosts, so I don't need local redundancy, but would like to use ZFS features.

ghost commented 4 years ago

Please try with the latest version of the ports (202004150x). We were not initializing the arc free target in earlier versions.

Sorry, haven't noticed this came out. Trying, thanks! I will report back with the findings. BTW, according to the outputs, ARC oversizing doesn't seem to be the problem here, no?

No I suppose I didn't pay enough attention after "old version of port eats all memory" to notice the ARC is relatively small. Good to update anyway though! :)

ahrens commented 4 years ago

I have redundancy between the hosts, so I don't need local redundancy, but would like to use ZFS features.

That's fine, but I think you'd be much better off with a single zpool. You can still have no redundancy, as you do now. i.e. zpool create poolname disk1 disk2 disk3 disk4 ...

bra-fsn commented 4 years ago

That's fine, but I think you'd be much better off with a single zpool. You can still have no redundancy, as you do now. i.e. zpool create poolname disk1 disk2 disk3 disk4 ...

I have 44-60 disks in a machine. Rebuilding 43-59 times the amount needed because of one dying seems to be somewhat excess. :)

bra-fsn commented 4 years ago

Upgrading to openzfs version 2020041502 (commit a7929f313) caused no change:

image

vmstat output now:

# vmstat -z
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP

UMA Kegs:               248,      0,     261,       9,     261,   0,   0
UMA Zones:             3536,      0,     282,       0,     282,   0,   0
UMA Slabs:               80,      0, 8101740,  377810,675146765,   0,   0
UMA Hash:               256,      0,      48,      27,     112,   0,   0
4 Bucket:                32,      0,     503,    5372,1019633211,   0,   0
6 Bucket:                48,      0,     138,    4925,480159794,   0,   0
8 Bucket:                64,      0,     140,    4448,503770590, 221,   0
12 Bucket:               96,      0,     119,    2628,395902172,4495,   0
16 Bucket:              128,      0,     168,    2839,206574623, 554,   0
32 Bucket:              256,      0,     264,    1056,185083549,21700,   0
64 Bucket:              512,      0,     150,     538,103597133,2904436,   0
128 Bucket:            1024,      0,     155,     249,87529512,4798359,   0
256 Bucket:            2048,      0,     431,     127,123514057,5844746,   0
vmem:                  1856,      0,       5,       1,       5,   0,   0
vmem btag:               56,      0, 6651076,    5884,30001352,48089,   0
VM OBJECT:              256,      0,  392422,   35993,56467611,   0,   0
RADIX NODE:             144,      0, 3196174,   46445,409522007,   0,   0
MAP:                    240,      0,       3,      61,       3,   0,   0
KMAP ENTRY:             120,      0,      17,     181,      28,   0,   0
MAP ENTRY:              120,      0,   11620,   10556,209566687,   0,   0
VMSPACE:               2560,      0,      75,     942, 2331960,   0,   0
fakepg:                 104,      0,       0,     646,  247018,   0,   0
64 pcpu:                  8,      0,    4030,    2882,    4544,   0,   0
mt_stats_zone:           64,      0,     417,     159,     417,   0,   0
mt_zone:                 24,      0,     417,     585,     417,   0,   0
16:                      16,      0,   42943,    6755,200911524,   0,   0
32:                      32,      0,  851722, 1566028,24467147073,   0,   0
64:                      64,      0, 5369950,    8054,3852421230,   0,   0
128:                    128,      0, 5357335,   19150,6431073106,   0,   0
256:                    256,      0,  100563,   51192,3740422405,   0,   0
512:                    512,      0,  542236,   16700,488434275,   0,   0
1024:                  1024,      0,   72280,      96,22375779,   0,   0
2048:                  2048,      0,    6625,     163,2335731055,   0,   0
4096:                  4096,      0,  435820,       5,12551849,   0,   0
8192:                  8192,      0,     492,       0, 3014945,   0,   0
16384:                16384,      0,     210,       0, 4119237,   0,   0
32768:                32768,      0, 4830133,       0, 5227778,   0,   0
65536:                65536,      0,      62,       0,11851168,   0,   0
SLEEPQUEUE:              80,      0,   24779,    1261,   24779,   0,   0
Files:                   80,      0,    2145,    7405,173952002,   0,   0
filedesc0:             1104,      0,     132,      96, 2332371,   0,   0
rangeset pctrie nodes:    144,      0,       0,       0,       0,   0,   0
rl_entry:                40,      0,    1693,    4307,    1693,   0,   0
TURNSTILE:              136,      0,   24779,     581,   24779,   0,   0
umtx pi:                 96,      0,       0,       0,       0,   0,   0
umtx_shm:                88,      0,       0,       0,       0,   0,   0
MAC labels:              40,      0,       0,       0,       0,   0,   0
PROC:                  1328,      0,     133,    1046, 2332370,   0,   0
THREAD:                1824,      0,   24390,     388, 1150782,   0,   0
cpuset:                 104,      0,      51,     848,      51,   0,   0
domainset:               40,      0,       0,       0,       0,   0,   0
audit_record:          1280,      0,       0,       0,       0,   0,   0
mbuf_packet:            256, 3355455,      51,    1002,169426021,   0,   0
mbuf:                   256, 3355455,    6363,   10434,5515089249,   0,   0
mbuf_cluster:          2048, 524288,    7194,    4566,2374801478,   0,   0
mbuf_jumbo_page:       4096, 262144,      23,      10,404551983,   0,   0
mbuf_jumbo_9k:         9216,  77672,       0,       0,       0,   0,   0
mbuf_jumbo_16k:       16384,  43690,       0,       0,       0,   0,   0
epoch_record pcpu:      256,      0,       4,      12,       4,   0,   0
FPU_save_area:          832,      0,       0,       0,       0,   0,   0
DMAR_MAP_ENTRY:         120,      0,       0,       0,       0,   0,   0
ttyinq:                 160,      0,     180,     120,     675,   0,   0
ttyoutq:                256,      0,      95,     115,     353,   0,   0
g_bio:                  376,      0,     112,    1208,4023073807,   0,   0
nvme_request:           128,      0,       0,       0,       0,   0,   0
cryptop:                128,      0,       0,       0,       0,   0,   0
cryptodesc:             120,      0,       0,       0,       0,   0,   0
crypto_session:          24,      0,       0,       0,       0,   0,   0
vtnet_tx_hdr:            24,      0,       0,       0,       0,   0,   0
VNODE:                  480,      0,  528885,     691, 9089184,   0,   0
VNODEPOLL:              120,      0,       0,       0,       0,   0,   0
BUF TRIE:               144,      0,    2751,  103197, 3948232,   0,   0
NAMEI:                 1024,      0,       0,     172,246458867,   0,   0
rentr:                   24,      0,       0,       0,       1,   0,   0
S VFS Cache:            108,      0,  526175,    4320,15518196,   0,   0
STS VFS Cache:          148,      0,    4018,    1650,10504400,   0,   0
L VFS Cache:            328,      0,       0,       0,       0,   0,   0
LTS VFS Cache:          368,      0,       3,      27,    2869,   0,   0
NCLNODE:                592,      0,    2722,     110,    4224,   0,   0
DIRHASH:               1024,      0,       0,       0,    1212,   0,   0
pipe:                   760,      0,     168,      67, 1465442,   0,   0
procdesc:               136,      0,       0,       0,       0,   0,   0
Mountpoints:           2744,      0,      57,       5,      58,   0,   0
AIO:                    208,      0,       0,    1007,     810,   0,   0
AIOP:                    32,      0,       4,    2621,     309,   0,   0
AIOCB:                  752,      0,       0,     255,   12802,   0,   0
AIOLIO:                 280,      0,       0,       0,       0,   0,   0
ksiginfo:               112,      0,    1141,    1904,  621423,   0,   0
itimer:                 352,      0,       0,       0,       0,   0,   0
KNOTE:                  160,      0,    1411,    1339,1322389379,   0,   0
socket:                 872, 6286248,    1352,    3068,31846407,   0,   0
unpcb:                  256, 6286260,      17,      88,  135259,   0,   0
ipq:                     56,  16401,       0,       0,     465,   0,   0
udp_inpcb:              488, 6286248,       5,      27,15721195,   0,   0
udpcb:                   32, 6286250,       5,     495,15721195,   0,   0
tcp_inpcb:              488, 6286248,    1333,     299,15989942,   0,   0
tcpcb:                  976, 6286248,    1329,      55,15989942,   0,   0
tcptw:                   88,  27810,       4,     131, 8049188,   0,   0
syncache:               168,  15364,       0,       0, 7920488,   0,   0
hostcache:               96,  15375,      56,     149,      83,   0,   0
sackhole:                32,      0,       0,       0,      79,   0,   0
tfo:                      4,      0,       0,       0,       0,   0,   0
tfo_ccache_entries:      80,      0,       0,       0,       0,   0,   0
tcpreass:                48,  32785,       0,       0,       0,   0,   0
tcp_log:                400, 1000000,       0,       0,       0,   0,   0
tcp_log_bucket:         144,      0,       0,       0,       0,   0,   0
tcp_log_node:           120,      0,       0,       0,       0,   0,   0
udplite_inpcb:          488, 6286248,       0,       0,       0,   0,   0
ripcb:                  488, 6286248,       0,       0,       0,   0,   0
IPsec SA lft_c:          16,      0,       0,       0,       0,   0,   0
rtentry:                208,      0,      25,     165,      97,   0,   0
selfd:                   64,      0,    1205,    4623,52610095,   0,   0
swpctrie:               144, 24512571,       0,       0,       0,   0,   0
swblk:                  136, 24512569,       0,       0,       0,   0,   0
md0:                    512,      0,   20000,      16,   20000,   0,   0
FFS inode:              160,      0,     656,     544,  528034,   0,   0
FFS1 dinode:            128,      0,       0,       0,       0,   0,   0
FFS2 dinode:            256,      0,     656,     319,  528034,   0,   0
md1:                    512,      0,   70000,       8,   70000,   0,   0
TMPFS dirent:            64,      0,     428,    2424,    4534,   0,   0
TMPFS node:             232,      0,     431,     504,    4537,   0,   0
pf mtags:                48,      0,       0,       0,      51,   0,   0
pf tags:                104,      0,       0,       0,       0,   0,   0
pf states:              296, 100009,    2050,    1031,31001054,   0,   0
pf state keys:           88,      0,    2050,    3485,31001054,   0,   0
pf source nodes:        136,  10005,       0,       0,       0,   0,   0
pf table entries:       216, 200016,      12,      24,      12,   0,   0
pf frags:               112,      0,       0,       0,       0,   0,   0
pf frag entries:         40,   5000,       0,       0,       0,   0,   0
pf state scrubs:         40,      0,       0,       0,       0,   0,   0
taskq_zone:             168,      0,      43,    1557,65580883,   0,   0
zfs_btree_leaf_cache:   4096,      0, 1366405,       3,68300809,   0,   0
ddt_cache:            24840,      0,     598,       0,    1196,   0,   0
ddt_entry_cache:        392,      0,       0,       0,       0,   0,   0
zio_cache:             1208,      0,   43634,    3757,7236136076,   0,   0
zio_link_cache:          48,      0,   42305,   11728,8656029411,   0,   0
zio_buf_512:            512,      0,    1617,    1615,52315494,   0,   0
zio_data_buf_512:       512,      0,      18,      94,154928024,   0,   0
zio_buf_1024:          1024,      0,    9599,    1237,271651537,   0,   0
zio_data_buf_1024:     1024,      0,       0,       0,378565459,   0,   0
zio_buf_1536:          1536,      0,   40026,     662,372855740,   0,   0
zio_data_buf_1536:     1536,      0,      58,     186,537807774,   0,   0
zio_buf_2048:          2048,      0,   99759,     573,342856308,   0,   0
zio_data_buf_2048:     2048,      0,       0,       0,509224951,   0,   0
zio_buf_2560:          2560,      0,  144104,      96,229357036,   0,   0
zio_data_buf_2560:     2560,      0,      10,      24,170718935,   0,   0
zio_buf_3072:          3072,      0,   90987,      57,134959728,   0,   0
zio_data_buf_3072:     3072,      0,       0,       0,183802283,   0,   0
zio_buf_3584:          3584,      0,   10435,       1,15512497,   0,   0
zio_data_buf_3584:     3584,      0,      16,      54,126668908,   0,   0
zio_buf_4096:          4096,      0,   48371,      92,470844984,   0,   0
zio_data_buf_4096:     4096,      0,       2,       0,974103426,   0,   0
zio_buf_5120:          5120,      0,       0,       5, 8162133,   0,   0
zio_data_buf_5120:     5120,      0,       6,       4,124781625,   0,   0
zio_buf_6144:          6144,      0,       0,       2, 7498640,   0,   0
zio_data_buf_6144:     6144,      0,       2,      12,83521651,   0,   0
zio_buf_7168:          7168,      0,       0,       2, 4850788,   0,   0
zio_data_buf_7168:     7168,      0,       4,       0,59555882,   0,   0
zio_buf_8192:          8192,      0,       1,       7,34062687,   0,   0
zio_data_buf_8192:     8192,      0,       2,       4,197486927,   0,   0
zio_buf_10240:        10240,      0,       0,       8,26263912,   0,   0
zio_data_buf_10240:   10240,      0,      10,      12,69289067,   0,   0
zio_buf_12288:        12288,      0,       0,       2,12406608,   0,   0
zio_data_buf_12288:   12288,      0,       0,       0,90183683,   0,   0
zio_buf_14336:        14336,      0,       0,       5, 4568215,   0,   0
zio_data_buf_14336:   14336,      0,       0,       0,34072707,   0,   0
zio_buf_16384:        16384,      0,  535533,      30,1618986804,   0,   0
zio_data_buf_16384:   16384,      0,       0,       0,42274264,   0,   0
zio_buf_20480:        20480,      0,       0,       4, 8798396,   0,   0
zio_data_buf_20480:   20480,      0,       0,       0,28845509,   0,   0
zio_buf_24576:        24576,      0,       0,       2, 8158561,   0,   0
zio_data_buf_24576:   24576,      0,       0,       0,19866928,   0,   0
zio_buf_28672:        28672,      0,       0,       3, 6336088,   0,   0
zio_data_buf_28672:   28672,      0,       0,       0,12098864,   0,   0
zio_buf_32768:        32768,      0,       0,       1, 6134777,   0,   0
zio_data_buf_32768:   32768,      0,       0,       0,11262674,   0,   0
zio_buf_40960:        40960,      0,       0,       1, 9020003,   0,   0
zio_data_buf_40960:   40960,      0,       0,       0,20995892,   0,   0
zio_buf_49152:        49152,      0,       0,       0, 4922695,   0,   0
zio_data_buf_49152:   49152,      0,       0,       0,18782587,   0,   0
zio_buf_57344:        57344,      0,       0,       0, 5209947,   0,   0
zio_data_buf_57344:   57344,      0,       0,       0,17214515,   0,   0
zio_buf_65536:        65536,      0,       0,       0, 3027380,   0,   0
zio_data_buf_65536:   65536,      0,       0,       0,20960916,   0,   0
zio_buf_81920:        81920,      0,       0,       0, 3451479,   0,   0
zio_data_buf_81920:   81920,      0,       0,       0,61630888,   0,   0
zio_buf_98304:        98304,      0,       0,       0, 2330682,   0,   0
zio_data_buf_98304:   98304,      0,      12,       0,227460343,   0,   0
zio_buf_114688:      114688,      0,       0,       0, 2942587,   0,   0
zio_data_buf_114688: 114688,      0,       6,       0,440548673,   0,   0
zio_buf_131072:      131072,      0,   22632,      14,454123779,   0,   0
zio_data_buf_131072: 131072,      0,       0,       0, 3059584,   0,   0
zio_buf_163840:      163840,      0,       0,       0, 2090522,   0,   0
zio_data_buf_163840: 163840,      0,       0,       0,  891980,   0,   0
zio_buf_196608:      196608,      0,       0,       0, 1576412,   0,   0
zio_data_buf_196608: 196608,      0,       0,       0,  751257,   0,   0
zio_buf_229376:      229376,      0,       0,       0, 2553008,   0,   0
zio_data_buf_229376: 229376,      0,       0,       0,  606102,   0,   0
zio_buf_262144:      262144,      0,       0,       0,  683463,   0,   0
zio_data_buf_262144: 262144,      0,       0,       0,  478605,   0,   0
zio_buf_327680:      327680,      0,       0,       0, 1382487,   0,   0
zio_data_buf_327680: 327680,      0,       0,       0,  673700,   0,   0
zio_buf_393216:      393216,      0,       0,       0,  693804,   0,   0
zio_data_buf_393216: 393216,      0,       0,       0,  504009,   0,   0
zio_buf_458752:      458752,      0,       0,       0,  677480,   0,   0
zio_data_buf_458752: 458752,      0,       0,       0,  555147,   0,   0
zio_buf_524288:      524288,      0,       0,       0,  491905,   0,   0
zio_data_buf_524288: 524288,      0,       0,       0,  394321,   0,   0
zio_buf_655360:      655360,      0,       0,       0,  753145,   0,   0
zio_data_buf_655360: 655360,      0,       0,       0,  725171,   0,   0
zio_buf_786432:      786432,      0,       0,       0,  750387,   0,   0
zio_data_buf_786432: 786432,      0,       0,       0, 2715537,   0,   0
zio_buf_917504:      917504,      0,       0,       0,  942802,   0,   0
zio_data_buf_917504: 917504,      0,       0,       0, 3121112,   0,   0
zio_buf_1048576:     1048576,      0,       0,       0,28390417,   0,   0
zio_data_buf_1048576: 1048576,      0,     161,       0,   66789,   0,   0
zio_buf_1310720:     1310720,      0,       0,       0,       0,   0,   0
zio_data_buf_1310720: 1310720,      0,       0,       0,       0,   0,   0
zio_buf_1572864:     1572864,      0,       0,       0,       0,   0,   0
zio_data_buf_1572864: 1572864,      0,       0,       0,       0,   0,   0
zio_buf_1835008:     1835008,      0,       0,       0,       0,   0,   0
zio_data_buf_1835008: 1835008,      0,       0,       0,       0,   0,   0
zio_buf_2097152:     2097152,      0,       0,       0,       0,   0,   0
zio_data_buf_2097152: 2097152,      0,       0,       0,       0,   0,   0
zio_buf_2621440:     2621440,      0,       0,       0,       0,   0,   0
zio_data_buf_2621440: 2621440,      0,       0,       0,       0,   0,   0
zio_buf_3145728:     3145728,      0,       0,       0,       0,   0,   0
zio_data_buf_3145728: 3145728,      0,       0,       0,       0,   0,   0
zio_buf_3670016:     3670016,      0,       0,       0,       0,   0,   0
zio_data_buf_3670016: 3670016,      0,       0,       0,       0,   0,   0
zio_buf_4194304:     4194304,      0,       0,       0,       0,   0,   0
zio_data_buf_4194304: 4194304,      0,       0,       0,       0,   0,   0
zio_buf_5242880:     5242880,      0,       0,       0,       0,   0,   0
zio_data_buf_5242880: 5242880,      0,       0,       0,       0,   0,   0
zio_buf_6291456:     6291456,      0,       0,       0,       0,   0,   0
zio_data_buf_6291456: 6291456,      0,       0,       0,       0,   0,   0
zio_buf_7340032:     7340032,      0,       0,       0,       0,   0,   0
zio_data_buf_7340032: 7340032,      0,       0,       0,       0,   0,   0
zio_buf_8388608:     8388608,      0,       0,       0,       0,   0,   0
zio_data_buf_8388608: 8388608,      0,       0,       0,       0,   0,   0
zio_buf_10485760:    10485760,      0,       0,       0,       0,   0,   0
zio_data_buf_10485760: 10485760,      0,       0,       0,       0,   0,   0
zio_buf_12582912:    12582912,      0,       0,       0,       0,   0,   0
zio_data_buf_12582912: 12582912,      0,       0,       0,       0,   0,   0
zio_buf_14680064:    14680064,      0,       0,       0,       0,   0,   0
zio_data_buf_14680064: 14680064,      0,       0,       0,       0,   0,   0
zio_buf_16777216:    16777216,      0,       0,       0,       0,   0,   0
zio_data_buf_16777216: 16777216,      0,       0,       0,       0,   0,   0
lz4_cache:            16384,      0,       0,      17,136251449,   0,   0
abd_chunk:             4096,      0,  585782,      93,207453815,   0,   0
sa_cache:               264,      0,  524942,    1798, 8552123,   0,   0
dnode_t:                808,      0,  560109,    6846, 8553138,   0,   0
arc_buf_hdr_t_full:     256,      0,  632238,  661872,10747751761,   0,   0
arc_buf_hdr_t_full_crypt:    320,      0,       0,       0,       0,   0,   0
arc_buf_hdr_t_l2only:     96,      0,       0,       0,       0,   0,   0
arc_buf_t:               64,      0,  558987,   15133,1960576502,   0,   0
dmu_buf_impl_t:         296,      0, 1098015,   14057,40482414,   0,   0
zil_lwb_cache:          360,      0,      44,     187,26471268,   0,   0
zil_zcw_cache:           80,      0,       0,       0,30536598,   0,   0
sio_cache_0:            136,      0, 1337577,  832928,1514491581,   0,   0
sio_cache_1:            152,      0,  269448,  115404,289654418,   0,   0
sio_cache_2:            168,      0,       0,      69,114551205,   0,   0
zfs_znode_cache:        472,      0,  524942,    1794, 8552123,   0,   0

Any ideas on what would be useful to debug this?

amotin commented 4 years ago

Upgrading to openzfs version 2020041502 (commit [a7929f3]) caused no change:

I see an important change here -- system no longer goes to swap.

bra-fsn commented 4 years ago

Upgrading to openzfs version 2020041502 (commit [a7929f3]) caused no change:

I see an important change here -- system no longer goes to swap.

Yes, because with FreeBSD's in-tree ZFS I got reboots and I also expected that with the openzfs port, so I've configured a swap and a dump device. This time -because the problem here seems to be quite self-explaining- I omitted that. Sorry if it caused confusion.

amotin commented 4 years ago

OK. After another look on vmstat -z output I see that 147GB of your RAM is consumed by 32KB malloc(9) zone. You should look into netstat -m output to see what malloc type those allocations belong to. It may be or may be not ZFS related.

bra-fsn commented 4 years ago

OK. After another look on vmstat -z output I see that 147GB of your RAM is consumed by 32KB malloc(9) zone. You should look into netstat -m output to see what malloc type those allocations belong to. It may be or may be not ZFS related.

I would think it's clearly ZFS-related and @ahrens' explanation in the linked thread seems to be the problem. If I rewrite the zpools and fragmentation decreases, this kind of problem disappears.

Here's another memory graph from a machine, which is exactly the same as this, but got rewritten zpools: image

This has 69G wired, but out of it 42G is ARC, which is fine.

On the problematic machine, the wired memory grows even during importing the zpools and it's even worse when a crash occurred (so it fits the explanation around ZIL playback as well).

Anyways, here's the output:

# netstat -m
6378/13227/19605 mbufs in use (current/cache/total)
6190/3088/9278/524288 mbuf clusters in use (current/cache/total/max)
49/1 mbuf+clusters out of packet secondary zone in use (current/cache)
4/0/4/262144 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/77672 9k jumbo clusters in use (current/cache/total/max)
0/0/0/43690 16k jumbo clusters in use (current/cache/total/max)
13990K/9482K/23473K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
1079778 sendfile syscalls
747468 sendfile syscalls completed without I/O request
318323 requests for I/O initiated by sendfile
1435220 pages read by sendfile as part of a request
8303549 pages were valid at time of a sendfile request
77 pages were valid and substituted to bogus page
0 pages were requested for read ahead by applications
2063198 pages were read ahead by sendfile
12807 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed
amotin commented 4 years ago

I'm sorry, I wanted to say vmstat -m to see malloc types. Too many tools with -m. ;)

bra-fsn commented 4 years ago

I'm sorry, I wanted to say vmstat -m to see malloc types. Too many tools with -m. ;)

Oh, I didn't understand netstat, should've corrected...

vmstat -m
         Type InUse MemUse HighUse Requests  Size(s)
      CAM XPT   450    42K       -  4173986  16,32,64,128,256,512,2048,8192,65536
      entropy     1     1K       -   118019  32,4096
      CAM DEV    51   102K       -      350  2048
      CAM CCB    48    96K       - 2206779846  2048
     CAM path   100     4K       -   681966  32
   CAM periph   100    25K       -      617  16,32,64,128,256
       feeder     7     1K       -        7  32
CAM I/O Scheduler    47     6K       -       47  128
    CAM queue    54  1518K       -     1002  16,32,64,128,256,512,1024,2048,4096,32768
         UART     3     3K       -        3  16,1024
          USB    48    57K       -       65  16,32,128,256,512,1024,4096,8192,32768
       USBdev    35     5K       -       40  32,64,128,256,512
CAM dev queue     3     1K       -        3  64
      scsi_da     0     0K       -     1441  32,64,256
    ciss_data    15    19K       -       17  16,512,1024,4096,8192
        vtbuf    24  1968K       -       46  4096
           vt    11     6K       -       11  512
       DEVFS3   581   146K       -      626  256
       DEVFS1   287   144K       -      310  512
   DEVFS_RULE    55    26K       -       55  64,512
        DEVFS   103     3K       -      106  16,32,128
       DEVFSP     1     1K       -    77662  64
NFSD V4client     1     1K       -        1  256
 NFSD lckfile     1     1K       -        1  256
       NFS fh  2722    86K       - 27962503  32
NFSCL diroffdiroff     6     3K       -        6  512
 NFSD session     1     1K       -        1  1024
newnfsclient_req     0     0K       -       88  128
    newnfsmnt     3     6K       -        4  2048
    pfs_nodes    20    10K       -       20  512
         GEOM   539   114K       -  2136891  16,32,64,128,256,512,1024,2048,4096,8192
    raid_data     0     0K       -      594  32,128,256
       isadev     7     1K       -        7  128
     SCSI ENC   264   292K       -   905276  16,64,128,1024,2048,4096,32768,65536
  ddb_capture     1    64K       -        1  65536
         cdev     4     1K       -        4  256
     filedesc    31   317K       -  1901449  16,32,64,128,256,4096,8192,16384,32768,65536
        sigio     4     1K       -    99254  64
     filecaps     9     1K       -   475281  16,32,64
      kdtrace 23479  5845K       -  6208206  64,256
         kenv   131    13K       -    13442  16,32,64,128,8192
       kqueue   332  1574K       -  5155314  64,128,256,512,2048,4096,8192,16384
    proc-args   251    11K       -  1585261  16,32,64,128,256
  Fail Points     0     0K       -      100  1024
        hhook    13     4K       -       13  256
      ithread   282    52K       -      282  32,128,256
       prison     8     1K       -        8  16,32
       KTRACE   100    13K       -      100  128
       linker   248  1957K       -      316  16,32,64,128,256,512,1024,2048,4096,8192,16384,32768
        lockf    50     6K       -      872  64,128
   loginclass     3     1K       -        3  64
       devbuf  2916  7134K       -     4402  16,32,64,128,256,512,1024,2048,4096,8192,16384,65536
         temp    81    20K       -  6654240  16,32,64,128,256,512,1024,2048,4096,8192,16384,65536
       module   515    65K       -      516  128
     mtx_pool     2    72K       -        2  8192,65536
          osd 22631   355K       -    47273  16,32,64,128,256
     pmchooks     1     1K       -        1  128
          pmc     2     1K       -        2  64
         pgrp    26     4K       -   188085  128
      session    20     3K       -    96209  128
         proc     2   256K       -        2  
      subproc  1315  1130K       -  2502265  512,4096
         cred   193    49K       -  1047375  256
        evdev     3     3K       -        4  1024
       plimit    19     5K       -    34051  256
      uidinfo    12    34K       -    94002  128,32768
       sysctl     0     0K       -    56834  64
    sysctloid 17496   902K       -    18419  16,32,64,128
    sysctltmp     0     0K       -   339558  16,32,64,256,1024
     acpiintr     1     1K       -        1  64
      tidhash     1   256K       -        1  
      callout    25  7304K       -       25  
         umtx 49556  6195K       -    49556  128
     p1003.1b     1     1K       -        1  16
          bus  1707   176K       -    45765  16,32,64,128,256,512,1024,4096
       bus-sc   119  2034K       -    27970  16,32,128,256,512,1024,2048,4096,8192,16384,32768,65536
       acpica  5050   501K       -   190298  16,32,64,128,256,512,1024,2048
      devstat    30    61K       -       30  32,4096
        epoch     4     1K       -        4  128
 eventhandler   152    13K       -      152  64,128
   gtaskqueue   154    47K       -      154  16,32,256,8192
         kobj   341  1364K       -     1210  4096
      Per-cpu     1     1K       -        1  32
     acpitask     1    64K       -        1  65536
         rman   457    47K       -      851  16,32,128
         sbuf     1     1K       -   912725  16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536
    toponodes    64     8K       -       64  128
       kbdmux     6    22K       -        7  16,512,1024,2048,16384
        stack     0     0K       -     1002  256
    taskqueue  6339   838K       -    13017  16,32,64,128,256
     terminal    11     3K       -       11  256
       Unitno    48     3K       -  2364317  32,64
         vmem     4  3328K       -       33  4096,8192,16384,32768,65536
     ioctlops     0     0K       -     1462  256,512,1024,2048,4096
       select  1847   231K       -     1847  128
          iov     0     0K       - 108152153  16,32,64,128,256,512
          msg     4    30K       -        4  2048,4096,8192,16384
          sem     4   106K       -        4  2048,4096
          shm     1    32K       -        1  32768
          tty    14    14K       -       22  1024
          pts     1     1K       -        9  256
     mbuf_tag     0     0K       -       22  32
        shmfd     1     8K       -        1  8192
       soname     7     1K       - 51390649  16,32,128
          pcb   664 17457K       - 23847469  16,32,64,1024,2048
     vfscache     4  8385K       -        4  256,65536
   cl_savebuf     0     0K       -      104  64
     vfs_hash     1  4096K       -        1  
       vnodes    21     1K       -       21  32,256
        mount  1506    56K       -     2431  16,32,64,128,256
       statfs     0     0K       -  3041630  4096
  vnodemarker     0     0K       -   493921  512
chacha20random     1     8K       -        1  8192
          BPF    12  1026K       -       20  16,128,512,1024
      ifdescr     1     1K       -     1272  32
        ifnet     9    17K       -        9  128,2048
       ifaddr   184    56K       -      184  16,32,64,128,256,512,2048,4096
  ether_multi    90     8K       -      129  16,32,64,128
        clone    10     2K       -       10  128
        ipsec     3     1K       -        3  256
      lltable   102    44K       -      326  256,512
          tun     3     1K       -        3  32
        iflib   457  3460K       -      481  64,128,1024,8192,16384,32768
     routetbl    67    13K       -      199  32,64,128,256,512
         vnet     1     1K       -        1  64
    vnet_data     1   240K       -        1  
vnet_data_free     1     1K       -        1  32
         igmp     8     1K       -        8  128
     in_multi     1     1K       -        8  256
encap_export_host    12     1K       -       12  32,64
   tfo_ccache     1   128K       -        1  
    hostcache     1    32K       -        1  32768
          LRO    72  1440K       -       72  8192,32768
      tcpfunc     1     1K       -        1  64
     syncache     1    68K       -        1  
    in6_multi    51     7K       -       51  32,256
          mld     7     1K       -        7  128
       ip6ndp    14     3K       -       20  64,256
  inpcbpolicy  1350    43K       - 31735182  32
     secasvar     1     1K       -        1  1024
       sahead     1     1K       -        1  1024
  ipsecpolicy     2     2K       -        2  256,1024
    ipsec-saq     2     2K       -        2  1024
nfsclient_lock     0     0K       -    13204  512
nfsclient_nlminfo    53     2K       -      105  32
       crypto     2     5K       -        2  1024,4096
          rpc    50    26K       - 68846473  64,128,512,1024,4096
audit_evclass   230     8K       -      285  32
  ufs_dirhash     2     1K       -     1214  16,512
    ufs_quota     1  4096K       -        1  
    ufs_mount     6    33K       -        8  512,4096,8192
      UMAHash    64 49083K       -      218  512,1024,2048,4096,8192,16384,32768,65536
      md_disk   181    14K       -      181  32,4096
   md_sectors   179   716K       -      179  4096
          mpr    58  1343K       -      316  16,32,64,128,256,512,1024,4096,32768
      memdesc     1     4K       -        1  4096
     pci_link    16     2K       -       16  64,128
     atkbddev     2     1K       -        2  64
      acpisem    54     7K       -       54  128
      acpidev    55     4K       -       55  64
      CAM SIM     4     1K       -        4  256
       apmdev     1     1K       -        1  128
   madt_table     0     0K       -        2  256,4096
         intr     4   400K       -        4  65536
      io_apic     3     6K       -        3  2048
   local_apic     1    32K       -        1  32768
          MCA    24     3K       -       24  128
         cpus     2     1K       -        2  128
          msi    52     7K       -       52  128
     nexusdev     5     1K       -        5  16
  tmpfs mount     3     1K       -        3  128
   tmpfs name   428    17K       -     4817  16,32,64
      pf_temp     0     0K       -       51  32
      pf_hash     5 11524K       -        5  2048
     pf_ifnet    12     5K       -      133  256,2048
      pf_osfp  1191   123K       -     1191  64,128
      pf_rule   121   121K       -      121  1024
     pf_table     5    10K       -       10  2048
   kstat_data    15    15K       -       15  1024
      solaris 17403061 158226867K       - 40192812653  16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536
    sfs_nodes    96    48K       -       96  512
amotin commented 4 years ago

So indeed looks like ZFS: solaris 17403061 158226867K - 40192812653 16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536 Unfortunately it does not tell what exacty is leaking, but it definitely should not be like that. It is not a normal ARC memory usage.

bra-fsn commented 4 years ago

No, it's not ARC. It's really like what's been described, with the metaslabs.

ahrens commented 4 years ago

@bra-fsn How did you conclude that it's caused by metaslabs? The most common cause for metaslab memory usage is zfs_btree_leaf_cache, which stores the in-memory version of loaded metaslabs' spacemaps. If I'm reading correctly, that is using 5GB of RAM in your latest comment, which is considerable, but much less than the 158GB of "solaris" or 154GB in the "32768" cache.

I'm not super familiar with the FreeBSD diagnostics here, but it sounds like something is doing a lot of kmem_alloc(32K). It's definitely possible that this is related to ZFS and to metaslabs, but I don't have a guess as to what that would be, specifically. Maybe you could use dtrace to see what stacks are doing these allocations most often? (note, it looks like allocations of size > 16384 and <= 32768 will use this cache).

bra-fsn commented 4 years ago

@bra-fsn How did you conclude that it's caused by metaslabs? The most common cause for metaslab memory usage is zfs_btree_leaf_cache, which stores the in-memory version of loaded metaslabs' spacemaps. If I'm reading correctly, that is using 5GB of RAM in your latest comment, which is considerable, but much less than the 158GB of "solaris" or 154GB in the "32768" cache.

It wasn't me, but you :) From the thread, linked in the issue's opening post: https://openzfs.topicbox.com/groups/developer/T10533b84f9e1cfc5-M4fadd72936a441115b96f8f7/using-many-zpools

Of course that was before the AVL->btree change, but it made perfect sense. Everything you wrote there turned out to be correct, the memory usage is proportional to the level of fragmentation (well at least I could successfully and drastically reduce it by rewriting the pools). At least with the in-tree ZFS version, which has the AVL stuff. Sure, this may not be the case with this version and according to the current stats this seems to be justified (or the memory is not accounted right, I'm not familiar with the internals).

I'm not super familiar with the FreeBSD diagnostics here, but it sounds like something is doing a lot of kmem_alloc(32K). It's definitely possible that this is related to ZFS and to metaslabs, but I don't have a guess as to what that would be, specifically. Maybe you could use dtrace to see what stacks are doing these allocations most often? (note, it looks like allocations of size > 16384 and <= 32768 will use this cache).

Could you please help with that?

pcd1193182 commented 4 years ago

The important difference there is that in that case, the space was directly attributable to the range_seg_cache, which was the precursor to the zfs_btree_leaf_cache. In your case, the zfs_btree_leaf_cache is only using 5GB of RAM, so the high memory usage isn't caused by loading the spacemaps into memory.

As for the dtrace script, you want something that triggers on kmem_alloc when the size is > 16384 and <= 32768, and you probably want to do @aggr[stack()] = count(); so you can see which stacks are doing lots of these allocations.

bra-fsn commented 4 years ago

@pcd1193182 Understood, I'm just saying the effect is very similar.

I've restarted the machine and let it run for a while. top shows this ATM:

Mem: 2960M Active, 17G Inact, 79G Wired, 203M Buf, 88G Free
ARC: 13G Total, 4190M MFU, 8373M MRU, 14M Anon, 195M Header, 894M Other
     2617M Compressed, 9958M Uncompressed, 3.81:1 Ratio

The attached file: kmem.zip

has the vmstat -z output and two outputs for these: dtrace -n 'fbt::malloc:entry { @[stack()] = quantize(arg0); }' > malloc_quant dtrace -n 'fbt::zfs_kmem_alloc:entry { @[stack()] = quantize(arg0); }' > zfs_kmem_quant

I'm not sure how useful this will be though.

ahrens commented 4 years ago

I see a bunch of stacks like this:

              kernel`inflateInit2_+0xf6
              openzfs.ko`z_uncompress+0xbb
              openzfs.ko`gzip_decompress+0x24
              openzfs.ko`zio_decompress_data+0x63
              openzfs.ko`arc_buf_fill+0xa1c
              openzfs.ko`arc_read_done+0x242
              openzfs.ko`zio_done+0x887
              openzfs.ko`zio_execute+0x122
              kernel`taskqueue_run_locked+0x175
              kernel`taskqueue_thread_loop+0xa8
              kernel`fork_exit+0x83
              kernel`0xffffffff8103476e

           value  ------------- Distribution ------------- count    
           16384 |                                         0        
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 417157   
           65536 |                                         0        

I think the bug is:

static int
zlib_inflateEnd(z_stream *stream)
{
    return (inflateInit(stream));
}

I am guessing that should be calling something like inflateFini() or inflateEnd() which would free the buffer that was allocated by inflateInit().

ahrens commented 4 years ago

(incidentally, I noticed that the no-op functions zlib_workspace_alloc() and zlib_workspace_free() could be removed)

bra-fsn commented 4 years ago

I've also looked at those, but this is so basic and used everywhere, I couldn't think it's the cause. Why is this FreeBSD related?

ghost commented 4 years ago

I've brought up the zlib workspace parts with @mattmacy before. It's code we may want to implement in the future, so he left the stubs in place for now.

I'll have a look at the leaky bits. Thanks for helping troubleshoot this!

bra-fsn commented 4 years ago

Thanks a lot guys, building the new module and trying it out!

bra-fsn commented 4 years ago

Memory usage is constant after the change, I think this is solved with https://github.com/openzfs/zfs/pull/10252: image

Thanks and sorry for misleading the topic with the metaslabs-related problem (also good to see it's solved!).

ahrens commented 4 years ago

This bug still exists, it will be closed by PR #10252.