openzfsonosx / zfs

OpenZFS on OS X
https://openzfsonosx.org/
Other
824 stars 72 forks source link

Memory not released under pressure #782

Open mtrower opened 3 years ago

mtrower commented 3 years ago

Hi, I wasn't sure whether to file this against ZFS or the SPL, but the SPL has very few issues filed, so here I am.

After some heavy activity on a pool (intensive file creation and listing) I'm seeing a wired memory consumption of 14.07GB, of which 10.5GB appears to be consumed by ZFS:

% sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 11457265664

Which is all well and good, but even under pressure it wasn't dropping, so I tried to constrain the ARC:

kstat.zfs.darwin.tunable.zfs_arc_max: 0 -> 4294967296
kstat.zfs.darwin.tunable.zfs_arc_meta_limit: 0 -> 3221225472
kstat.zfs.darwin.tunable.zfs_arc_min: 0 -> 1610612736
kstat.zfs.darwin.tunable.zfs_arc_meta_min: 0 -> 1342177280
kstat.zfs.darwin.tunable.zfs_dirty_data_max: 1717986918 -> 536870912

ARC now looks like this:

    Time   read   miss  miss%   dmis  dm%  pmis  pm%   mmis  mm%   size  tsize  
13:44:04   122M    23M   18.9   466K  0.7   22M  39.1    23M  18.9  2515M  4294M  
13:44:05      0      0      0      0    0     0    0      0    0  2515M  4294M  
13:44:06      0      0      0      0    0     0    0      0    0  2515M  4294M 

but the SPL isn't dropping with it. No matter; let's apply some pressure and see if it releases.

sudo memory_pressure -l warn -s 8

App Memory consumption slowly climbs over the next 5-10 minutes. Pressure rises, and Wired Memory does not drop. Eventually, Compressed shoots through the roof (8GB or so), and we finally hit "warn", where I hold it for a while to observe.

We can see that the ARC releases memory at a few points:

    Time   read   miss  miss%   dmis  dm%  pmis  pm%   mmis  mm%   size  tsize  
13:56:46      0      0      0      0    0     0    0      0    0  2535M  4294M  
13:56:47      0      0      0      0    0     0    0      0    0  2535M  4294M  
13:56:48      0      0      0      0    0     0    0      0    0  2534M  2534M  
13:56:49      0      0      0      0    0     0    0      0    0  2534M  2534M 
...
13:57:32      0      0      0      0    0     0    0      0    0  2512M  2512M  
13:57:34      0      0      0      0    0     0    0      0    0  2507M  2507M  
13:57:35      0      0      0      0    0     0    0      0    0  2487M  2487M  
13:57:36      0      0      0      0    0     0    0      0    0  1390M  1610M  
13:57:37      0      0      0      0    0     0    0      0    0  1390M  1610M  
13:57:38      0      0      0      0    0     0    0      0    0  1353M  1610M 

But the SPL remains iron-fisted.

kstat.spl.misc.spl_misc.os_mem_alloc: 11457265664

Alright; let's just export the pool entirely (no pools imported), and check the ARC again:

    Time   read   miss  miss%   dmis  dm%  pmis  pm%   mmis  mm%   size  tsize  
00:44:09   122M    23M   18.9   468K  0.7   22M  39.1    23M  18.9    44M  1610M  
00:44:10      0      0      0      0    0     0    0      0    0    44M  1610M

And the SPL...

kstat.spl.misc.spl_misc.os_mem_alloc: 10613424128

So this time, the SPL dropped by the amount of ARC freed. What the heck is it doing with the rest?

Let's try applying putting the squeeze on again

% sudo memory_pressure -l warn

Memory looks like this:

Screen Shot 2021-01-03 at 00 52 38

kstat.spl.misc.spl_misc.os_mem_alloc: 3674210304

We've released a lot, but we're still holding on to >3GB with no pools imported?

Let's repeat from the start (sort of; I'm not rebooting, nor relaxing the ARC constraints). Import pool; do some work:

    Time   read   miss  miss%   dmis  dm%  pmis  pm%   mmis  mm%   size  tsize  
01:02:16    58K    13K   23.7    215  0.7   13K  45.8    13K  23.7  4288M  4294M  
01:02:17    59K    13K   22.6    214  0.7   13K  43.7    13K  22.6  4250M  4294M
kstat.spl.misc.spl_misc.os_mem_alloc: 6923747328

Seems reasonable. Apply pressure: takes forever again (>10m to hit WARN at a final pressure of 65%). For most of that time, memory consumption climbed very slowly with pressure holding around ~35%. It's as if memory_pressure is struggling to find pages to allocate, even though pressure is ostensibly low.

    Time   read   miss  miss%   dmis  dm%  pmis  pm%   mmis  mm%   size  tsize  
01:14:35   159M    30M   19.4   616K  0.7   30M  39.6    30M  19.4  4161M  4294M  
01:14:36      0      0      0      0    0     0    0      0    0  4161M  4294M
...
01:24:59      0      0      0      0    0     0    0      0    0  4059M  4059M  
01:25:00      0      0      0      0    0     0    0      0    0  3974M  4059M  
01:25:01      0      0      0      0    0     0    0      0    0  3974M  4059M  
01:25:02      0      0      0      0    0     0    0      0    0  3974M  4059M  
01:25:03      0      0      0      0    0     0    0      0    0  2318M  1610M  
01:25:04      0      0      0      0    0     0    0      0    0  1001M  1610M  
01:25:05      0      0      0      0    0     0    0      0    0  1001M  1610M  
01:25:06      0      0      0      0    0     0    0      0    0  1001M  1610M  
01:25:07      0      0      0      0    0     0    0      0    0  1001M  1610M  
01:25:08      0      0      0      0    0     0    0      0    0  1001M  1610M  
01:25:09      0      0      0      0    0     0    0      0    0  1001M  1610M  
01:25:10      0      0      0      0    0     0    0      0    0  1001M  1610M  
01:25:11      0      0      0      0    0     0    0      0    0   882M  1610M  

And SPL has barely budged:

kstat.spl.misc.spl_misc.os_mem_alloc: 6909067264

Finally, I reversed all of the tuning (thinking maybe the ARC minimums were causing SPL to hold memory)

kstat.zfs.darwin.tunable.zfs_arc_max: 4294967296 -> 0
kstat.zfs.darwin.tunable.zfs_arc_meta_limit: 3221225472 -> 0
kstat.zfs.darwin.tunable.zfs_arc_min: 1610612736 -> 0
kstat.zfs.darwin.tunable.zfs_arc_meta_min: 1342177280 -> 0
kstat.zfs.darwin.tunable.zfs_dirty_data_max: 536870912 -> 1717986918

exported the pool, and applied pressure, but the SPL is still holding >3GB

    Time   read   miss  miss%   dmis  dm%  pmis  pm%   mmis  mm%   size  tsize  
01:54:00      0      0      0      0    0     0    0      0    0   144K  1610M
sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 3529768960

System info

% sysctl zfs spl
zfs.kext_version: 1.9.4-0
spl.kext_version: 1.9.4-0

% sw_vers
ProductName:    Mac OS X
ProductVersion: 10.14.6
BuildVersion:   18G7016

Next steps?