Closed bra-fsn closed 4 years ago
Please try with the latest version of the ports (202004150x). We were not initializing the arc free target in earlier versions.
Your "how to reproduce" steps imply that the problem is with fragmentation causing memory use. The most plausible route for that is loaded metaslabs, which consume from the zfs_btree_leaf_cache
. If I'm reading correctly, that is using 7GB of RAM, which is considerable, but much less than the 179GB of "wired" memory that you're trying to account for.
Am I reading correctly that you have 45 separate storage pools? That should work, but it is not a use case that has received much scrutiny. Maybe you're hitting some unknown suboptimal behavior due to having so many storage pools?
Please try with the latest version of the ports (202004150x). We were not initializing the arc free target in earlier versions.
Sorry, haven't noticed this came out. Trying, thanks! I will report back with the findings. BTW, according to the outputs, ARC oversizing doesn't seem to be the problem here, no?
Your "how to reproduce" steps imply that the problem is with fragmentation causing memory use. The most plausible route for that is loaded metaslabs, which consume from the
zfs_btree_leaf_cache
. If I'm reading correctly, that is using 7GB of RAM, which is considerable, but much less than the 179GB of "wired" memory that you're trying to account for.
Hm, sorry, after that long thread I put a bookmark into my head with that I should watch out for the related work and haven't checked the outputs. Well, I'm not even sure the original conclusion is (still) standing. I'm taking a look at openzfs @ 2020041502 and will let you know what happens.
Am I reading correctly that you have 45 separate storage pools? That should work, but it is not a use case that has received much scrutiny. Maybe you're hitting some unknown suboptimal behavior due to having so many storage pools?
Yes, that's what I have. I have redundancy between the hosts, so I don't need local redundancy, but would like to use ZFS features.
Please try with the latest version of the ports (202004150x). We were not initializing the arc free target in earlier versions.
Sorry, haven't noticed this came out. Trying, thanks! I will report back with the findings. BTW, according to the outputs, ARC oversizing doesn't seem to be the problem here, no?
No I suppose I didn't pay enough attention after "old version of port eats all memory" to notice the ARC is relatively small. Good to update anyway though! :)
I have redundancy between the hosts, so I don't need local redundancy, but would like to use ZFS features.
That's fine, but I think you'd be much better off with a single zpool. You can still have no redundancy, as you do now. i.e. zpool create poolname disk1 disk2 disk3 disk4 ...
That's fine, but I think you'd be much better off with a single zpool. You can still have no redundancy, as you do now. i.e.
zpool create poolname disk1 disk2 disk3 disk4 ...
I have 44-60 disks in a machine. Rebuilding 43-59 times the amount needed because of one dying seems to be somewhat excess. :)
Upgrading to openzfs version 2020041502 (commit a7929f313) caused no change:
vmstat output now:
# vmstat -z
ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP
UMA Kegs: 248, 0, 261, 9, 261, 0, 0
UMA Zones: 3536, 0, 282, 0, 282, 0, 0
UMA Slabs: 80, 0, 8101740, 377810,675146765, 0, 0
UMA Hash: 256, 0, 48, 27, 112, 0, 0
4 Bucket: 32, 0, 503, 5372,1019633211, 0, 0
6 Bucket: 48, 0, 138, 4925,480159794, 0, 0
8 Bucket: 64, 0, 140, 4448,503770590, 221, 0
12 Bucket: 96, 0, 119, 2628,395902172,4495, 0
16 Bucket: 128, 0, 168, 2839,206574623, 554, 0
32 Bucket: 256, 0, 264, 1056,185083549,21700, 0
64 Bucket: 512, 0, 150, 538,103597133,2904436, 0
128 Bucket: 1024, 0, 155, 249,87529512,4798359, 0
256 Bucket: 2048, 0, 431, 127,123514057,5844746, 0
vmem: 1856, 0, 5, 1, 5, 0, 0
vmem btag: 56, 0, 6651076, 5884,30001352,48089, 0
VM OBJECT: 256, 0, 392422, 35993,56467611, 0, 0
RADIX NODE: 144, 0, 3196174, 46445,409522007, 0, 0
MAP: 240, 0, 3, 61, 3, 0, 0
KMAP ENTRY: 120, 0, 17, 181, 28, 0, 0
MAP ENTRY: 120, 0, 11620, 10556,209566687, 0, 0
VMSPACE: 2560, 0, 75, 942, 2331960, 0, 0
fakepg: 104, 0, 0, 646, 247018, 0, 0
64 pcpu: 8, 0, 4030, 2882, 4544, 0, 0
mt_stats_zone: 64, 0, 417, 159, 417, 0, 0
mt_zone: 24, 0, 417, 585, 417, 0, 0
16: 16, 0, 42943, 6755,200911524, 0, 0
32: 32, 0, 851722, 1566028,24467147073, 0, 0
64: 64, 0, 5369950, 8054,3852421230, 0, 0
128: 128, 0, 5357335, 19150,6431073106, 0, 0
256: 256, 0, 100563, 51192,3740422405, 0, 0
512: 512, 0, 542236, 16700,488434275, 0, 0
1024: 1024, 0, 72280, 96,22375779, 0, 0
2048: 2048, 0, 6625, 163,2335731055, 0, 0
4096: 4096, 0, 435820, 5,12551849, 0, 0
8192: 8192, 0, 492, 0, 3014945, 0, 0
16384: 16384, 0, 210, 0, 4119237, 0, 0
32768: 32768, 0, 4830133, 0, 5227778, 0, 0
65536: 65536, 0, 62, 0,11851168, 0, 0
SLEEPQUEUE: 80, 0, 24779, 1261, 24779, 0, 0
Files: 80, 0, 2145, 7405,173952002, 0, 0
filedesc0: 1104, 0, 132, 96, 2332371, 0, 0
rangeset pctrie nodes: 144, 0, 0, 0, 0, 0, 0
rl_entry: 40, 0, 1693, 4307, 1693, 0, 0
TURNSTILE: 136, 0, 24779, 581, 24779, 0, 0
umtx pi: 96, 0, 0, 0, 0, 0, 0
umtx_shm: 88, 0, 0, 0, 0, 0, 0
MAC labels: 40, 0, 0, 0, 0, 0, 0
PROC: 1328, 0, 133, 1046, 2332370, 0, 0
THREAD: 1824, 0, 24390, 388, 1150782, 0, 0
cpuset: 104, 0, 51, 848, 51, 0, 0
domainset: 40, 0, 0, 0, 0, 0, 0
audit_record: 1280, 0, 0, 0, 0, 0, 0
mbuf_packet: 256, 3355455, 51, 1002,169426021, 0, 0
mbuf: 256, 3355455, 6363, 10434,5515089249, 0, 0
mbuf_cluster: 2048, 524288, 7194, 4566,2374801478, 0, 0
mbuf_jumbo_page: 4096, 262144, 23, 10,404551983, 0, 0
mbuf_jumbo_9k: 9216, 77672, 0, 0, 0, 0, 0
mbuf_jumbo_16k: 16384, 43690, 0, 0, 0, 0, 0
epoch_record pcpu: 256, 0, 4, 12, 4, 0, 0
FPU_save_area: 832, 0, 0, 0, 0, 0, 0
DMAR_MAP_ENTRY: 120, 0, 0, 0, 0, 0, 0
ttyinq: 160, 0, 180, 120, 675, 0, 0
ttyoutq: 256, 0, 95, 115, 353, 0, 0
g_bio: 376, 0, 112, 1208,4023073807, 0, 0
nvme_request: 128, 0, 0, 0, 0, 0, 0
cryptop: 128, 0, 0, 0, 0, 0, 0
cryptodesc: 120, 0, 0, 0, 0, 0, 0
crypto_session: 24, 0, 0, 0, 0, 0, 0
vtnet_tx_hdr: 24, 0, 0, 0, 0, 0, 0
VNODE: 480, 0, 528885, 691, 9089184, 0, 0
VNODEPOLL: 120, 0, 0, 0, 0, 0, 0
BUF TRIE: 144, 0, 2751, 103197, 3948232, 0, 0
NAMEI: 1024, 0, 0, 172,246458867, 0, 0
rentr: 24, 0, 0, 0, 1, 0, 0
S VFS Cache: 108, 0, 526175, 4320,15518196, 0, 0
STS VFS Cache: 148, 0, 4018, 1650,10504400, 0, 0
L VFS Cache: 328, 0, 0, 0, 0, 0, 0
LTS VFS Cache: 368, 0, 3, 27, 2869, 0, 0
NCLNODE: 592, 0, 2722, 110, 4224, 0, 0
DIRHASH: 1024, 0, 0, 0, 1212, 0, 0
pipe: 760, 0, 168, 67, 1465442, 0, 0
procdesc: 136, 0, 0, 0, 0, 0, 0
Mountpoints: 2744, 0, 57, 5, 58, 0, 0
AIO: 208, 0, 0, 1007, 810, 0, 0
AIOP: 32, 0, 4, 2621, 309, 0, 0
AIOCB: 752, 0, 0, 255, 12802, 0, 0
AIOLIO: 280, 0, 0, 0, 0, 0, 0
ksiginfo: 112, 0, 1141, 1904, 621423, 0, 0
itimer: 352, 0, 0, 0, 0, 0, 0
KNOTE: 160, 0, 1411, 1339,1322389379, 0, 0
socket: 872, 6286248, 1352, 3068,31846407, 0, 0
unpcb: 256, 6286260, 17, 88, 135259, 0, 0
ipq: 56, 16401, 0, 0, 465, 0, 0
udp_inpcb: 488, 6286248, 5, 27,15721195, 0, 0
udpcb: 32, 6286250, 5, 495,15721195, 0, 0
tcp_inpcb: 488, 6286248, 1333, 299,15989942, 0, 0
tcpcb: 976, 6286248, 1329, 55,15989942, 0, 0
tcptw: 88, 27810, 4, 131, 8049188, 0, 0
syncache: 168, 15364, 0, 0, 7920488, 0, 0
hostcache: 96, 15375, 56, 149, 83, 0, 0
sackhole: 32, 0, 0, 0, 79, 0, 0
tfo: 4, 0, 0, 0, 0, 0, 0
tfo_ccache_entries: 80, 0, 0, 0, 0, 0, 0
tcpreass: 48, 32785, 0, 0, 0, 0, 0
tcp_log: 400, 1000000, 0, 0, 0, 0, 0
tcp_log_bucket: 144, 0, 0, 0, 0, 0, 0
tcp_log_node: 120, 0, 0, 0, 0, 0, 0
udplite_inpcb: 488, 6286248, 0, 0, 0, 0, 0
ripcb: 488, 6286248, 0, 0, 0, 0, 0
IPsec SA lft_c: 16, 0, 0, 0, 0, 0, 0
rtentry: 208, 0, 25, 165, 97, 0, 0
selfd: 64, 0, 1205, 4623,52610095, 0, 0
swpctrie: 144, 24512571, 0, 0, 0, 0, 0
swblk: 136, 24512569, 0, 0, 0, 0, 0
md0: 512, 0, 20000, 16, 20000, 0, 0
FFS inode: 160, 0, 656, 544, 528034, 0, 0
FFS1 dinode: 128, 0, 0, 0, 0, 0, 0
FFS2 dinode: 256, 0, 656, 319, 528034, 0, 0
md1: 512, 0, 70000, 8, 70000, 0, 0
TMPFS dirent: 64, 0, 428, 2424, 4534, 0, 0
TMPFS node: 232, 0, 431, 504, 4537, 0, 0
pf mtags: 48, 0, 0, 0, 51, 0, 0
pf tags: 104, 0, 0, 0, 0, 0, 0
pf states: 296, 100009, 2050, 1031,31001054, 0, 0
pf state keys: 88, 0, 2050, 3485,31001054, 0, 0
pf source nodes: 136, 10005, 0, 0, 0, 0, 0
pf table entries: 216, 200016, 12, 24, 12, 0, 0
pf frags: 112, 0, 0, 0, 0, 0, 0
pf frag entries: 40, 5000, 0, 0, 0, 0, 0
pf state scrubs: 40, 0, 0, 0, 0, 0, 0
taskq_zone: 168, 0, 43, 1557,65580883, 0, 0
zfs_btree_leaf_cache: 4096, 0, 1366405, 3,68300809, 0, 0
ddt_cache: 24840, 0, 598, 0, 1196, 0, 0
ddt_entry_cache: 392, 0, 0, 0, 0, 0, 0
zio_cache: 1208, 0, 43634, 3757,7236136076, 0, 0
zio_link_cache: 48, 0, 42305, 11728,8656029411, 0, 0
zio_buf_512: 512, 0, 1617, 1615,52315494, 0, 0
zio_data_buf_512: 512, 0, 18, 94,154928024, 0, 0
zio_buf_1024: 1024, 0, 9599, 1237,271651537, 0, 0
zio_data_buf_1024: 1024, 0, 0, 0,378565459, 0, 0
zio_buf_1536: 1536, 0, 40026, 662,372855740, 0, 0
zio_data_buf_1536: 1536, 0, 58, 186,537807774, 0, 0
zio_buf_2048: 2048, 0, 99759, 573,342856308, 0, 0
zio_data_buf_2048: 2048, 0, 0, 0,509224951, 0, 0
zio_buf_2560: 2560, 0, 144104, 96,229357036, 0, 0
zio_data_buf_2560: 2560, 0, 10, 24,170718935, 0, 0
zio_buf_3072: 3072, 0, 90987, 57,134959728, 0, 0
zio_data_buf_3072: 3072, 0, 0, 0,183802283, 0, 0
zio_buf_3584: 3584, 0, 10435, 1,15512497, 0, 0
zio_data_buf_3584: 3584, 0, 16, 54,126668908, 0, 0
zio_buf_4096: 4096, 0, 48371, 92,470844984, 0, 0
zio_data_buf_4096: 4096, 0, 2, 0,974103426, 0, 0
zio_buf_5120: 5120, 0, 0, 5, 8162133, 0, 0
zio_data_buf_5120: 5120, 0, 6, 4,124781625, 0, 0
zio_buf_6144: 6144, 0, 0, 2, 7498640, 0, 0
zio_data_buf_6144: 6144, 0, 2, 12,83521651, 0, 0
zio_buf_7168: 7168, 0, 0, 2, 4850788, 0, 0
zio_data_buf_7168: 7168, 0, 4, 0,59555882, 0, 0
zio_buf_8192: 8192, 0, 1, 7,34062687, 0, 0
zio_data_buf_8192: 8192, 0, 2, 4,197486927, 0, 0
zio_buf_10240: 10240, 0, 0, 8,26263912, 0, 0
zio_data_buf_10240: 10240, 0, 10, 12,69289067, 0, 0
zio_buf_12288: 12288, 0, 0, 2,12406608, 0, 0
zio_data_buf_12288: 12288, 0, 0, 0,90183683, 0, 0
zio_buf_14336: 14336, 0, 0, 5, 4568215, 0, 0
zio_data_buf_14336: 14336, 0, 0, 0,34072707, 0, 0
zio_buf_16384: 16384, 0, 535533, 30,1618986804, 0, 0
zio_data_buf_16384: 16384, 0, 0, 0,42274264, 0, 0
zio_buf_20480: 20480, 0, 0, 4, 8798396, 0, 0
zio_data_buf_20480: 20480, 0, 0, 0,28845509, 0, 0
zio_buf_24576: 24576, 0, 0, 2, 8158561, 0, 0
zio_data_buf_24576: 24576, 0, 0, 0,19866928, 0, 0
zio_buf_28672: 28672, 0, 0, 3, 6336088, 0, 0
zio_data_buf_28672: 28672, 0, 0, 0,12098864, 0, 0
zio_buf_32768: 32768, 0, 0, 1, 6134777, 0, 0
zio_data_buf_32768: 32768, 0, 0, 0,11262674, 0, 0
zio_buf_40960: 40960, 0, 0, 1, 9020003, 0, 0
zio_data_buf_40960: 40960, 0, 0, 0,20995892, 0, 0
zio_buf_49152: 49152, 0, 0, 0, 4922695, 0, 0
zio_data_buf_49152: 49152, 0, 0, 0,18782587, 0, 0
zio_buf_57344: 57344, 0, 0, 0, 5209947, 0, 0
zio_data_buf_57344: 57344, 0, 0, 0,17214515, 0, 0
zio_buf_65536: 65536, 0, 0, 0, 3027380, 0, 0
zio_data_buf_65536: 65536, 0, 0, 0,20960916, 0, 0
zio_buf_81920: 81920, 0, 0, 0, 3451479, 0, 0
zio_data_buf_81920: 81920, 0, 0, 0,61630888, 0, 0
zio_buf_98304: 98304, 0, 0, 0, 2330682, 0, 0
zio_data_buf_98304: 98304, 0, 12, 0,227460343, 0, 0
zio_buf_114688: 114688, 0, 0, 0, 2942587, 0, 0
zio_data_buf_114688: 114688, 0, 6, 0,440548673, 0, 0
zio_buf_131072: 131072, 0, 22632, 14,454123779, 0, 0
zio_data_buf_131072: 131072, 0, 0, 0, 3059584, 0, 0
zio_buf_163840: 163840, 0, 0, 0, 2090522, 0, 0
zio_data_buf_163840: 163840, 0, 0, 0, 891980, 0, 0
zio_buf_196608: 196608, 0, 0, 0, 1576412, 0, 0
zio_data_buf_196608: 196608, 0, 0, 0, 751257, 0, 0
zio_buf_229376: 229376, 0, 0, 0, 2553008, 0, 0
zio_data_buf_229376: 229376, 0, 0, 0, 606102, 0, 0
zio_buf_262144: 262144, 0, 0, 0, 683463, 0, 0
zio_data_buf_262144: 262144, 0, 0, 0, 478605, 0, 0
zio_buf_327680: 327680, 0, 0, 0, 1382487, 0, 0
zio_data_buf_327680: 327680, 0, 0, 0, 673700, 0, 0
zio_buf_393216: 393216, 0, 0, 0, 693804, 0, 0
zio_data_buf_393216: 393216, 0, 0, 0, 504009, 0, 0
zio_buf_458752: 458752, 0, 0, 0, 677480, 0, 0
zio_data_buf_458752: 458752, 0, 0, 0, 555147, 0, 0
zio_buf_524288: 524288, 0, 0, 0, 491905, 0, 0
zio_data_buf_524288: 524288, 0, 0, 0, 394321, 0, 0
zio_buf_655360: 655360, 0, 0, 0, 753145, 0, 0
zio_data_buf_655360: 655360, 0, 0, 0, 725171, 0, 0
zio_buf_786432: 786432, 0, 0, 0, 750387, 0, 0
zio_data_buf_786432: 786432, 0, 0, 0, 2715537, 0, 0
zio_buf_917504: 917504, 0, 0, 0, 942802, 0, 0
zio_data_buf_917504: 917504, 0, 0, 0, 3121112, 0, 0
zio_buf_1048576: 1048576, 0, 0, 0,28390417, 0, 0
zio_data_buf_1048576: 1048576, 0, 161, 0, 66789, 0, 0
zio_buf_1310720: 1310720, 0, 0, 0, 0, 0, 0
zio_data_buf_1310720: 1310720, 0, 0, 0, 0, 0, 0
zio_buf_1572864: 1572864, 0, 0, 0, 0, 0, 0
zio_data_buf_1572864: 1572864, 0, 0, 0, 0, 0, 0
zio_buf_1835008: 1835008, 0, 0, 0, 0, 0, 0
zio_data_buf_1835008: 1835008, 0, 0, 0, 0, 0, 0
zio_buf_2097152: 2097152, 0, 0, 0, 0, 0, 0
zio_data_buf_2097152: 2097152, 0, 0, 0, 0, 0, 0
zio_buf_2621440: 2621440, 0, 0, 0, 0, 0, 0
zio_data_buf_2621440: 2621440, 0, 0, 0, 0, 0, 0
zio_buf_3145728: 3145728, 0, 0, 0, 0, 0, 0
zio_data_buf_3145728: 3145728, 0, 0, 0, 0, 0, 0
zio_buf_3670016: 3670016, 0, 0, 0, 0, 0, 0
zio_data_buf_3670016: 3670016, 0, 0, 0, 0, 0, 0
zio_buf_4194304: 4194304, 0, 0, 0, 0, 0, 0
zio_data_buf_4194304: 4194304, 0, 0, 0, 0, 0, 0
zio_buf_5242880: 5242880, 0, 0, 0, 0, 0, 0
zio_data_buf_5242880: 5242880, 0, 0, 0, 0, 0, 0
zio_buf_6291456: 6291456, 0, 0, 0, 0, 0, 0
zio_data_buf_6291456: 6291456, 0, 0, 0, 0, 0, 0
zio_buf_7340032: 7340032, 0, 0, 0, 0, 0, 0
zio_data_buf_7340032: 7340032, 0, 0, 0, 0, 0, 0
zio_buf_8388608: 8388608, 0, 0, 0, 0, 0, 0
zio_data_buf_8388608: 8388608, 0, 0, 0, 0, 0, 0
zio_buf_10485760: 10485760, 0, 0, 0, 0, 0, 0
zio_data_buf_10485760: 10485760, 0, 0, 0, 0, 0, 0
zio_buf_12582912: 12582912, 0, 0, 0, 0, 0, 0
zio_data_buf_12582912: 12582912, 0, 0, 0, 0, 0, 0
zio_buf_14680064: 14680064, 0, 0, 0, 0, 0, 0
zio_data_buf_14680064: 14680064, 0, 0, 0, 0, 0, 0
zio_buf_16777216: 16777216, 0, 0, 0, 0, 0, 0
zio_data_buf_16777216: 16777216, 0, 0, 0, 0, 0, 0
lz4_cache: 16384, 0, 0, 17,136251449, 0, 0
abd_chunk: 4096, 0, 585782, 93,207453815, 0, 0
sa_cache: 264, 0, 524942, 1798, 8552123, 0, 0
dnode_t: 808, 0, 560109, 6846, 8553138, 0, 0
arc_buf_hdr_t_full: 256, 0, 632238, 661872,10747751761, 0, 0
arc_buf_hdr_t_full_crypt: 320, 0, 0, 0, 0, 0, 0
arc_buf_hdr_t_l2only: 96, 0, 0, 0, 0, 0, 0
arc_buf_t: 64, 0, 558987, 15133,1960576502, 0, 0
dmu_buf_impl_t: 296, 0, 1098015, 14057,40482414, 0, 0
zil_lwb_cache: 360, 0, 44, 187,26471268, 0, 0
zil_zcw_cache: 80, 0, 0, 0,30536598, 0, 0
sio_cache_0: 136, 0, 1337577, 832928,1514491581, 0, 0
sio_cache_1: 152, 0, 269448, 115404,289654418, 0, 0
sio_cache_2: 168, 0, 0, 69,114551205, 0, 0
zfs_znode_cache: 472, 0, 524942, 1794, 8552123, 0, 0
Any ideas on what would be useful to debug this?
Upgrading to openzfs version 2020041502 (commit [a7929f3]) caused no change:
I see an important change here -- system no longer goes to swap.
Upgrading to openzfs version 2020041502 (commit [a7929f3]) caused no change:
I see an important change here -- system no longer goes to swap.
Yes, because with FreeBSD's in-tree ZFS I got reboots and I also expected that with the openzfs port, so I've configured a swap and a dump device. This time -because the problem here seems to be quite self-explaining- I omitted that. Sorry if it caused confusion.
OK. After another look on vmstat -z
output I see that 147GB of your RAM is consumed by 32KB malloc(9) zone. You should look into netstat -m
output to see what malloc type those allocations belong to. It may be or may be not ZFS related.
OK. After another look on
vmstat -z
output I see that 147GB of your RAM is consumed by 32KB malloc(9) zone. You should look intonetstat -m
output to see what malloc type those allocations belong to. It may be or may be not ZFS related.
I would think it's clearly ZFS-related and @ahrens' explanation in the linked thread seems to be the problem. If I rewrite the zpools and fragmentation decreases, this kind of problem disappears.
Here's another memory graph from a machine, which is exactly the same as this, but got rewritten zpools:
This has 69G wired, but out of it 42G is ARC, which is fine.
On the problematic machine, the wired memory grows even during importing the zpools and it's even worse when a crash occurred (so it fits the explanation around ZIL playback as well).
Anyways, here's the output:
# netstat -m
6378/13227/19605 mbufs in use (current/cache/total)
6190/3088/9278/524288 mbuf clusters in use (current/cache/total/max)
49/1 mbuf+clusters out of packet secondary zone in use (current/cache)
4/0/4/262144 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/77672 9k jumbo clusters in use (current/cache/total/max)
0/0/0/43690 16k jumbo clusters in use (current/cache/total/max)
13990K/9482K/23473K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
1079778 sendfile syscalls
747468 sendfile syscalls completed without I/O request
318323 requests for I/O initiated by sendfile
1435220 pages read by sendfile as part of a request
8303549 pages were valid at time of a sendfile request
77 pages were valid and substituted to bogus page
0 pages were requested for read ahead by applications
2063198 pages were read ahead by sendfile
12807 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed
I'm sorry, I wanted to say vmstat -m
to see malloc types. Too many tools with -m
. ;)
I'm sorry, I wanted to say
vmstat -m
to see malloc types. Too many tools with-m
. ;)
Oh, I didn't understand netstat, should've corrected...
vmstat -m
Type InUse MemUse HighUse Requests Size(s)
CAM XPT 450 42K - 4173986 16,32,64,128,256,512,2048,8192,65536
entropy 1 1K - 118019 32,4096
CAM DEV 51 102K - 350 2048
CAM CCB 48 96K - 2206779846 2048
CAM path 100 4K - 681966 32
CAM periph 100 25K - 617 16,32,64,128,256
feeder 7 1K - 7 32
CAM I/O Scheduler 47 6K - 47 128
CAM queue 54 1518K - 1002 16,32,64,128,256,512,1024,2048,4096,32768
UART 3 3K - 3 16,1024
USB 48 57K - 65 16,32,128,256,512,1024,4096,8192,32768
USBdev 35 5K - 40 32,64,128,256,512
CAM dev queue 3 1K - 3 64
scsi_da 0 0K - 1441 32,64,256
ciss_data 15 19K - 17 16,512,1024,4096,8192
vtbuf 24 1968K - 46 4096
vt 11 6K - 11 512
DEVFS3 581 146K - 626 256
DEVFS1 287 144K - 310 512
DEVFS_RULE 55 26K - 55 64,512
DEVFS 103 3K - 106 16,32,128
DEVFSP 1 1K - 77662 64
NFSD V4client 1 1K - 1 256
NFSD lckfile 1 1K - 1 256
NFS fh 2722 86K - 27962503 32
NFSCL diroffdiroff 6 3K - 6 512
NFSD session 1 1K - 1 1024
newnfsclient_req 0 0K - 88 128
newnfsmnt 3 6K - 4 2048
pfs_nodes 20 10K - 20 512
GEOM 539 114K - 2136891 16,32,64,128,256,512,1024,2048,4096,8192
raid_data 0 0K - 594 32,128,256
isadev 7 1K - 7 128
SCSI ENC 264 292K - 905276 16,64,128,1024,2048,4096,32768,65536
ddb_capture 1 64K - 1 65536
cdev 4 1K - 4 256
filedesc 31 317K - 1901449 16,32,64,128,256,4096,8192,16384,32768,65536
sigio 4 1K - 99254 64
filecaps 9 1K - 475281 16,32,64
kdtrace 23479 5845K - 6208206 64,256
kenv 131 13K - 13442 16,32,64,128,8192
kqueue 332 1574K - 5155314 64,128,256,512,2048,4096,8192,16384
proc-args 251 11K - 1585261 16,32,64,128,256
Fail Points 0 0K - 100 1024
hhook 13 4K - 13 256
ithread 282 52K - 282 32,128,256
prison 8 1K - 8 16,32
KTRACE 100 13K - 100 128
linker 248 1957K - 316 16,32,64,128,256,512,1024,2048,4096,8192,16384,32768
lockf 50 6K - 872 64,128
loginclass 3 1K - 3 64
devbuf 2916 7134K - 4402 16,32,64,128,256,512,1024,2048,4096,8192,16384,65536
temp 81 20K - 6654240 16,32,64,128,256,512,1024,2048,4096,8192,16384,65536
module 515 65K - 516 128
mtx_pool 2 72K - 2 8192,65536
osd 22631 355K - 47273 16,32,64,128,256
pmchooks 1 1K - 1 128
pmc 2 1K - 2 64
pgrp 26 4K - 188085 128
session 20 3K - 96209 128
proc 2 256K - 2
subproc 1315 1130K - 2502265 512,4096
cred 193 49K - 1047375 256
evdev 3 3K - 4 1024
plimit 19 5K - 34051 256
uidinfo 12 34K - 94002 128,32768
sysctl 0 0K - 56834 64
sysctloid 17496 902K - 18419 16,32,64,128
sysctltmp 0 0K - 339558 16,32,64,256,1024
acpiintr 1 1K - 1 64
tidhash 1 256K - 1
callout 25 7304K - 25
umtx 49556 6195K - 49556 128
p1003.1b 1 1K - 1 16
bus 1707 176K - 45765 16,32,64,128,256,512,1024,4096
bus-sc 119 2034K - 27970 16,32,128,256,512,1024,2048,4096,8192,16384,32768,65536
acpica 5050 501K - 190298 16,32,64,128,256,512,1024,2048
devstat 30 61K - 30 32,4096
epoch 4 1K - 4 128
eventhandler 152 13K - 152 64,128
gtaskqueue 154 47K - 154 16,32,256,8192
kobj 341 1364K - 1210 4096
Per-cpu 1 1K - 1 32
acpitask 1 64K - 1 65536
rman 457 47K - 851 16,32,128
sbuf 1 1K - 912725 16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536
toponodes 64 8K - 64 128
kbdmux 6 22K - 7 16,512,1024,2048,16384
stack 0 0K - 1002 256
taskqueue 6339 838K - 13017 16,32,64,128,256
terminal 11 3K - 11 256
Unitno 48 3K - 2364317 32,64
vmem 4 3328K - 33 4096,8192,16384,32768,65536
ioctlops 0 0K - 1462 256,512,1024,2048,4096
select 1847 231K - 1847 128
iov 0 0K - 108152153 16,32,64,128,256,512
msg 4 30K - 4 2048,4096,8192,16384
sem 4 106K - 4 2048,4096
shm 1 32K - 1 32768
tty 14 14K - 22 1024
pts 1 1K - 9 256
mbuf_tag 0 0K - 22 32
shmfd 1 8K - 1 8192
soname 7 1K - 51390649 16,32,128
pcb 664 17457K - 23847469 16,32,64,1024,2048
vfscache 4 8385K - 4 256,65536
cl_savebuf 0 0K - 104 64
vfs_hash 1 4096K - 1
vnodes 21 1K - 21 32,256
mount 1506 56K - 2431 16,32,64,128,256
statfs 0 0K - 3041630 4096
vnodemarker 0 0K - 493921 512
chacha20random 1 8K - 1 8192
BPF 12 1026K - 20 16,128,512,1024
ifdescr 1 1K - 1272 32
ifnet 9 17K - 9 128,2048
ifaddr 184 56K - 184 16,32,64,128,256,512,2048,4096
ether_multi 90 8K - 129 16,32,64,128
clone 10 2K - 10 128
ipsec 3 1K - 3 256
lltable 102 44K - 326 256,512
tun 3 1K - 3 32
iflib 457 3460K - 481 64,128,1024,8192,16384,32768
routetbl 67 13K - 199 32,64,128,256,512
vnet 1 1K - 1 64
vnet_data 1 240K - 1
vnet_data_free 1 1K - 1 32
igmp 8 1K - 8 128
in_multi 1 1K - 8 256
encap_export_host 12 1K - 12 32,64
tfo_ccache 1 128K - 1
hostcache 1 32K - 1 32768
LRO 72 1440K - 72 8192,32768
tcpfunc 1 1K - 1 64
syncache 1 68K - 1
in6_multi 51 7K - 51 32,256
mld 7 1K - 7 128
ip6ndp 14 3K - 20 64,256
inpcbpolicy 1350 43K - 31735182 32
secasvar 1 1K - 1 1024
sahead 1 1K - 1 1024
ipsecpolicy 2 2K - 2 256,1024
ipsec-saq 2 2K - 2 1024
nfsclient_lock 0 0K - 13204 512
nfsclient_nlminfo 53 2K - 105 32
crypto 2 5K - 2 1024,4096
rpc 50 26K - 68846473 64,128,512,1024,4096
audit_evclass 230 8K - 285 32
ufs_dirhash 2 1K - 1214 16,512
ufs_quota 1 4096K - 1
ufs_mount 6 33K - 8 512,4096,8192
UMAHash 64 49083K - 218 512,1024,2048,4096,8192,16384,32768,65536
md_disk 181 14K - 181 32,4096
md_sectors 179 716K - 179 4096
mpr 58 1343K - 316 16,32,64,128,256,512,1024,4096,32768
memdesc 1 4K - 1 4096
pci_link 16 2K - 16 64,128
atkbddev 2 1K - 2 64
acpisem 54 7K - 54 128
acpidev 55 4K - 55 64
CAM SIM 4 1K - 4 256
apmdev 1 1K - 1 128
madt_table 0 0K - 2 256,4096
intr 4 400K - 4 65536
io_apic 3 6K - 3 2048
local_apic 1 32K - 1 32768
MCA 24 3K - 24 128
cpus 2 1K - 2 128
msi 52 7K - 52 128
nexusdev 5 1K - 5 16
tmpfs mount 3 1K - 3 128
tmpfs name 428 17K - 4817 16,32,64
pf_temp 0 0K - 51 32
pf_hash 5 11524K - 5 2048
pf_ifnet 12 5K - 133 256,2048
pf_osfp 1191 123K - 1191 64,128
pf_rule 121 121K - 121 1024
pf_table 5 10K - 10 2048
kstat_data 15 15K - 15 1024
solaris 17403061 158226867K - 40192812653 16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536
sfs_nodes 96 48K - 96 512
So indeed looks like ZFS:
solaris 17403061 158226867K - 40192812653 16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536
Unfortunately it does not tell what exacty is leaking, but it definitely should not be like that. It is not a normal ARC memory usage.
No, it's not ARC. It's really like what's been described, with the metaslabs.
@bra-fsn How did you conclude that it's caused by metaslabs? The most common cause for metaslab memory usage is zfs_btree_leaf_cache, which stores the in-memory version of loaded metaslabs' spacemaps. If I'm reading correctly, that is using 5GB of RAM in your latest comment, which is considerable, but much less than the 158GB of "solaris" or 154GB in the "32768" cache.
I'm not super familiar with the FreeBSD diagnostics here, but it sounds like something is doing a lot of kmem_alloc(32K). It's definitely possible that this is related to ZFS and to metaslabs, but I don't have a guess as to what that would be, specifically. Maybe you could use dtrace to see what stacks are doing these allocations most often? (note, it looks like allocations of size > 16384 and <= 32768 will use this cache).
@bra-fsn How did you conclude that it's caused by metaslabs? The most common cause for metaslab memory usage is zfs_btree_leaf_cache, which stores the in-memory version of loaded metaslabs' spacemaps. If I'm reading correctly, that is using 5GB of RAM in your latest comment, which is considerable, but much less than the 158GB of "solaris" or 154GB in the "32768" cache.
It wasn't me, but you :) From the thread, linked in the issue's opening post: https://openzfs.topicbox.com/groups/developer/T10533b84f9e1cfc5-M4fadd72936a441115b96f8f7/using-many-zpools
Of course that was before the AVL->btree change, but it made perfect sense. Everything you wrote there turned out to be correct, the memory usage is proportional to the level of fragmentation (well at least I could successfully and drastically reduce it by rewriting the pools). At least with the in-tree ZFS version, which has the AVL stuff. Sure, this may not be the case with this version and according to the current stats this seems to be justified (or the memory is not accounted right, I'm not familiar with the internals).
I'm not super familiar with the FreeBSD diagnostics here, but it sounds like something is doing a lot of kmem_alloc(32K). It's definitely possible that this is related to ZFS and to metaslabs, but I don't have a guess as to what that would be, specifically. Maybe you could use dtrace to see what stacks are doing these allocations most often? (note, it looks like allocations of size > 16384 and <= 32768 will use this cache).
Could you please help with that?
The important difference there is that in that case, the space was directly attributable to the range_seg_cache, which was the precursor to the zfs_btree_leaf_cache. In your case, the zfs_btree_leaf_cache is only using 5GB of RAM, so the high memory usage isn't caused by loading the spacemaps into memory.
As for the dtrace script, you want something that triggers on kmem_alloc when the size is > 16384 and <= 32768, and you probably want to do @aggr[stack()] = count();
so you can see which stacks are doing lots of these allocations.
@pcd1193182 Understood, I'm just saying the effect is very similar.
I've restarted the machine and let it run for a while.
top
shows this ATM:
Mem: 2960M Active, 17G Inact, 79G Wired, 203M Buf, 88G Free
ARC: 13G Total, 4190M MFU, 8373M MRU, 14M Anon, 195M Header, 894M Other
2617M Compressed, 9958M Uncompressed, 3.81:1 Ratio
The attached file: kmem.zip
has the vmstat -z
output and two outputs for these:
dtrace -n 'fbt::malloc:entry { @[stack()] = quantize(arg0); }' > malloc_quant
dtrace -n 'fbt::zfs_kmem_alloc:entry { @[stack()] = quantize(arg0); }' > zfs_kmem_quant
I'm not sure how useful this will be though.
I see a bunch of stacks like this:
kernel`inflateInit2_+0xf6
openzfs.ko`z_uncompress+0xbb
openzfs.ko`gzip_decompress+0x24
openzfs.ko`zio_decompress_data+0x63
openzfs.ko`arc_buf_fill+0xa1c
openzfs.ko`arc_read_done+0x242
openzfs.ko`zio_done+0x887
openzfs.ko`zio_execute+0x122
kernel`taskqueue_run_locked+0x175
kernel`taskqueue_thread_loop+0xa8
kernel`fork_exit+0x83
kernel`0xffffffff8103476e
value ------------- Distribution ------------- count
16384 | 0
32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 417157
65536 | 0
I think the bug is:
static int
zlib_inflateEnd(z_stream *stream)
{
return (inflateInit(stream));
}
I am guessing that should be calling something like inflateFini()
or inflateEnd()
which would free the buffer that was allocated by inflateInit()
.
(incidentally, I noticed that the no-op functions zlib_workspace_alloc() and zlib_workspace_free() could be removed)
I've also looked at those, but this is so basic and used everywhere, I couldn't think it's the cause. Why is this FreeBSD related?
I've brought up the zlib workspace parts with @mattmacy before. It's code we may want to implement in the future, so he left the stubs in place for now.
I'll have a look at the leaky bits. Thanks for helping troubleshoot this!
Thanks a lot guys, building the new module and trying it out!
Memory usage is constant after the change, I think this is solved with https://github.com/openzfs/zfs/pull/10252:
Thanks and sorry for misleading the topic with the metaslabs-related problem (also good to see it's solved!).
This bug still exists, it will be closed by PR #10252.
System information
Describe the problem you're observing
Detailed description is here: https://openzfs.topicbox.com/groups/developer/T10533b84f9e1cfc5 I think the optimization work is now done/merged (https://github.com/openzfs/zfs/pull/9181) and my openzfs version contains it. I can acknowledge that it makes the machine more stable. It can now survive for around 1.5 days with 192G RAM.
Memory usage is like this:
top shows ATM:
vmstat -z output:
@ahrens, @pcd1193182 do you have any further ideas on how to improve this situation? I've already started rewriting the pools with ashift=12, but it takes ages to complete...
Describe how to reproduce the problem
See the above mailing list thread. Basically:
(some disks are rewritten on this machine)