Closed alanruttenberg closed 9 years ago
Alan,
Its almost certainly the ZFS ARC. On a 32 GB machine with default settings it will grow to about 10GB and remain static from then on.
Thanks for the response Brendon. My machine is 16GB - what should I expect? Is the ARC memory shown as wired/kernel memory?
I left my machine after rebooting for an hour (forgot to do it right after rebooting) and see that it is already using quite a lot of memory:
8.4GB wired Kernel Task: 8GB
Thanks, Alan
zfs.arc_max: 0
zfs.arc_min: 0
zfs.arc_meta_used: 1499092352
zfs.arc_meta_limit: 3221225472
zfs.l2arc_write_max: 8388608
zfs.l2arc_write_boost: 8388608
zfs.l2arc_headroom: 2
zfs.l2arc_feed_secs: 1
zfs.l2arc_feed_min_ms: 200
zfs.l2arc_noprefetch: 1
zfs.l2arc_feed_again: 1
zfs.l2arc_norw: 0
zfs.anon_size: 40960
zfs.anon_metadata_lsize: 0
zfs.anon_data_lsize: 0
zfs.mru_size: 3368943104
zfs.mru_metadata_lsize: 751133696
zfs.mru_data_lsize: 2504770048
zfs.mru_ghost_size: 937030656
zfs.mru_ghost_metadata_lsize: 1233920
zfs.mru_ghost_data_lsize: 935796736
zfs.mfu_size: 292008960
zfs.mfu_metadata_lsize: 68096
zfs.mfu_data_lsize: 291777024
zfs.mfu_ghost_size: 125116416
zfs.mfu_ghost_metadata_lsize: 0
zfs.mfu_ghost_data_lsize: 125116416
zfs.l2c_only_size: 11672340480
zfs.vnop_osx_debug: 0
zfs.vnop_ignore_negatives: 0
zfs.vnop_ignore_positives: 0
zfs.vnop_create_negatives: 1
zfs.reclaim_list: 0
zfs.vnop_reclaim_throttle: 33280
zfs.vfs_suspend_fs_begin_delay: 2
zfs.vfs_suspend_fs_end_delay: 2
zfs.vnop_force_formd_normalized_output: 0
About 40% of memory assuming you have not increased the zfs.arc_max sysctl or changed the code in arc.c. And yes, ARC appears as wired memory. However, not all wired memory is ARC memory.
perhaps you could post the current values of the spl sysctls?
Also, when using sysctl, don't forget the spl tree, it shows total memory in use by all ZFS.
# sysctl spl
spl.kmem_bytes_total: 29010314
spl.num_threads: 103
Mine says:
spl.kmem_bytes_total: 4359841962 spl.num_threads: 107
TBH, however, I am suspicious that either this is low or that there is a memory leak as my kernel wired memory is 8.7GB and I seem to remember there being substantially less wired memory than the current 4.6 GB that is now wired, exclusive of ZFS.
Looks like this amount of wired memory is up 300M in the last couple of days.
-Alan
zfs.l2c_only_size: 55 236 383 232
zfs.l2c_only_size: 11 672 340 480
Those are big. What do your "zpool list -v" and "zpool status -vD" say?
Alan,
As I explained above, the allocator in spl is not very space efficient. I would not suprise me if the ~4 GB of allocations reported above resulted in ~8GB of actual wired memory.
The only aspect of this discussion that concerns me is that you are apparently reporting growth in use beyond the expected 40-50% of main memory. This may imply a leak somewhere.
Here are the outputs of those commands:
den:~ alanr$ zpool list -v
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
zfsroot 4.53T 3.50T 1.03T 77% 1.00x ONLINE -
mirror 1.81T 1.27T 554G -
disk8 - - - -
disk3 - - - -
mirror 1.36T 1.10T 265G -
disk6 - - - -
disk5 - - - -
mirror 1.36T 1.13T 235G -
disk7 - - - -
disk4 - - - -
disk1s4 5.31G 288K 5.31G -
cache - - - - - -
disk1s3 54.7G 54.4G 287M -
den:~ alanr$ zpool status -vD
pool: zfsroot
state: ONLINE
scan: scrub repaired 0 in 10h42m with 0 errors on Sat Jul 26 12:10:02 2014
config:
NAME STATE READ WRITE CKSUM
zfsroot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
disk8 ONLINE 0 0 0
disk3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
disk6 ONLINE 0 0 0
disk5 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
disk7 ONLINE 0 0 0
disk4 ONLINE 0 0 0
logs
disk1s4 ONLINE 0 0 0
cache
disk1s3 ONLINE 0 0 0
errors: No known data errors
dedup: no DDT entries
Attached is memory usage i logged at 1 minute intervals over 4 days. Unfortunately it doesn't include all times starting at boot up, but hopefully it is informative. The first spike at 12 hours is the only event I made a note of - a copy of 38G to the pool. I'm not sure what the other stuff is, particularly the dip to 1 gig. Maybe when I deleted a pool? During the time I also restored a hard disk from a 250G DMG on one of the pools.
Perhaps this can give you some insight into what is going on with memory - whether it is as you would expect/desire.
I think your key may be inverted. Having said that it looks nominal to me.
Yes, key is inverted - oops [fixed]
Under what circumstances does a GC occur? Can it be triggered with some command line incantation?
In any case, I agree with Lundman's comment in Issue 215.
This is an unfortunate amount of wired memory unused, and it would be nice if peak usage didn't go as high as 11.5 GB on the 16GB machine.
I'm pretty sure that the protracted higher memory usage period was during the restore from DMG. In that case I was restoring to firewire, so peak write rate was 30M/s or so and in many cases was under 10M/s. Peak read rates in my setup is something like 180M/s. I don't think my machine would need so much buffer to keep read up to write rates. Having ZFS recognize that and not use all that space "would be nice". I'll try to get to uploading a fixed image, because the oops bugs me...
There is no way to manually trigger a GC. The GC behavior is internal to bmalloc, and essentially results in all memory released by the ZFS being returned to the OS ~120 seconds later provided there is no further request to allocate that memory.
Alan, in your case I recommend that you simply reduce the size of the ARC. You can do it in code (zfs/module/zfs/arc.c - (see https://github.com/openzfsonosx/zfs/commit/91dc3f5df0eb4644c24021f9f2c8e35807e901b6) for an example, or alternatively alter zfs.arc_max sysctl to taste after the zfs and spl kexts are loaded.
I do intend revisiting some aspects of bmalloc in the hope of reducing its overhead, but this is not something I will be releasing without extensive testing as failures in memory handling could easily screw up someones data.
The inefficiency is certainly not ideal. That said, if the allocations were indeed more efficient, there is nothing about 11.5 GB/16 GB that is in itself concerning. The default on illumos is actually up to 75% of memory:
/* set min cache to 1/32 of all memory, or 64MB, whichever is more */
arc_c_min = MAX(arc_c / 4, 64<<20);
/* set max to 3/4 of all memory, or all but 1GB, whichever is more */
if (arc_c * 8 >= 1<<30)
arc_c_max = (arc_c * 8) - (1<<30);
else
arc_c_max = arc_c_min;
arc_c_max = MAX(arc_c * 6, arc_c_max);
As another datapoint I see on average ~14-15GB of memory wired for kernel task typically on a system (10.9.5) with 32GB of memory. Upon a fresh boot it will be about ~1GB and then doing anything substantial with the pool (Arq running a backup job, enabling spotlight on a dataset, etc.) it quickly escalates.
I'm interested in this because I have had instances where it seems like Arq or ZFS (or both) may get into a race to deplete memory prompting the "Force quit you're out of memory" message but I don't have anything beyond anecdotes and observations at the time and haven't been able to chase anything down.
Presently after I enabled spotlight my sysctl variables look like:
kstat.spl.misc.bmalloc.apps_allocated: 13113991877
kstat.spl.misc.bmalloc.bmalloc_allocated: 13786657929
kstat.spl.misc.bmalloc.space_efficiency_percent: 95
kstat.spl.misc.bmalloc.active_threads: 110
kstat.spl.misc.bmalloc.pressure_thr_wakes: 1080615
kstat.spl.misc.bmalloc.pressure_pages_wanted: 874
kstat.spl.misc.bmalloc.pressure_pages_released: 903
kstat.spl.misc.bmalloc.gc_wake_count: 12236
kstat.spl.misc.bmalloc.gc_pages_released: 0
&&
hw.physmem = 2147483648
hw.usermem = 3344109568
hw.memsize = 34359738368
kern.memorystatus_vm_pressure_level: 1
kern.memorystatus_purge_on_warning: 2
kern.memorystatus_purge_on_urgent: 5
kern.memorystatus_purge_on_critical: 8
vm.memory_pressure: 0
hw.memsize: 34359738368
machdep.memmap.Conventional: 34319855616
machdep.memmap.RuntimeServices: 0
machdep.memmap.ACPIReclaim: 20480
machdep.memmap.ACPINVS: 2895872
machdep.memmap.PalCode: 0
machdep.memmap.Reserved: 95465472
machdep.memmap.Unusable: 0
machdep.memmap.Other: 0
audit.session.member_set_sflags_mask: 0
audit.session.member_clear_sflags_mask: 16384
kstat.zfs.misc.dmu_tx.dmu_tx_memory_reserve: 0
kstat.zfs.misc.dmu_tx.dmu_tx_memory_reclaim: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 0
kstat.zfs.misc.arcstats.memory_direct_count: 0
kstat.zfs.misc.arcstats.memory_indirect_count: 0
Two days of uptime post-Yosemite update, same system with 32GB of memory and I recently did some shuffling of 400GB of data across this mirror (via mv
between filesystems to clear copies=2
status after changing it to copies=1
fwiw) — no zil devices, no cache devices, just a two-disk pool.
kstat.spl.misc.bmalloc.apps_allocated: 14168822968
kstat.spl.misc.bmalloc.bmalloc_allocated: 16501253257
kstat.spl.misc.bmalloc.space_efficiency_percent: 85
This time I opted to spam my kstat.zfs*
values into a gist; https://gist.github.com/emory/2e472eba7526dbed9262#file-gistfile1-txt
When the pool heats up with activity it negatively impacts performance for the user at-console — UIs become less responsive, mouse movement is choppy. These are dedicated devices as whole disks and I have them on two different controllers (one external via eSATA one internal via SATA both identical disks) and if I think about it I think I'd rather have slower file operations on the pool than slower everything that is irregularly moving windows and trying to work.
My use case is that my Photography and Home Videos live on that pool — I use Lightroom and have my catalogs on my Fusion Drive HFS+ Volume which is separate from a dedicated SSD boot volume. I think I'm doing everything right in terms of optimizing use (catalogs and indexes on HFS+ Fusion Drives for performance and the images referenced are all on my zpool for reliability and integrity) so it isn't my intent to hijack someone else's issue but I think I'd prefer artificially capping resource utilization if it means the rest of my activities aren't impacted (obviously file operations will be impacted if they use that volume but I think that's what people expect already?)
In my particular circumstance is everything working as designed and should I consider a cache or ZIL in spite of my somewhat modest needs? I'm using 2x2TB physical volumes without dedupe but I do use compression.
When was the last time you updated your source trees and upgraded spl and zfs? The memory model was just totally overhauled two days ago, and this issue should probably be closed since it was about the old memory model.
It's been a couple of weeks; I rebuilt binaries last night but haven't tested or activated. I'll be sure to prioritize that!
FWIW I'm using trunk as of today and I'm also experiencing the choppy, delayed mouse and less-responsiveness, though not always - seemingly after some extended use. Memory, at least as reported by istat, doesn't seem too problematic. I've tried limiting arc_max to between 3 and 6g and still have this happen.
Another anomaly is that it takes a longish time for my time machine ZVOL to mount. It's 500G quote, 325G full. It got so bad that I landed up cloning to a new ZVOL and while it is better now, it still has a delay mounting.
I haven't noticed any delays in mounting personally, my openzfsosx workstation only has a 2TB mirror that is at about 70% capacity but I don't have a TM volume on it — maybe Time Machine is the cause of the delay in your case?
I rebuilt my binaries and installed two days ago. Heavy utilization still results in some chop but it isn't as severe as it used to be. Memory utilization via kernel_task isn't as exaggerated at all but holy hell File Cache
seems excessive or maybe it's just accurately reporting now?
Right now memory_pressure
is super low but I'm using 31.46GB of memory and 174MB of swap. 5GB wired, 12.78GB of App Memory, 220MB compressed and uh 13.7GB File Cache.
What really makes my machine get weird is doing something that traverses several hundred directories and looks for changed files — a common occurrence for Lightroom and checking metadata updates. FWIW my workflow is not in Adobe DNG but regular RAW files with sidecars so each image has an accompanying file with it. In my most usual workflow it skims 1335 directories and checks about 90k files ranging in size from 400KB to 50MB with some outliers that are much larger (exported TIFFs that stay in-library can be a few hundred MB.)
Current code addresses this issue. It is believed that the problem was the allocator we have built into the SPL was driving the OS underlying allocator in such a way as to cause the stuttering/choking behaviours discussed above.
I'm not sure what the best info would be to assess this - let me know. For now I've included the output of zfs list, the output of sysctl zfs after several days, and the output of zdb. When I reboot I'll add another sysctl zfs file to compare. (Why doesn't github let me attach .txt files???)
zfs.arc_max: 0 zfs.arc_min: 0 zfs.arc_meta_used: 1702334208 zfs.arc_meta_limit: 3221225472 zfs.l2arc_write_max: 8388608 zfs.l2arc_write_boost: 8388608 zfs.l2arc_headroom: 2 zfs.l2arc_feed_secs: 1 zfs.l2arc_feed_min_ms: 200 zfs.l2arc_noprefetch: 1 zfs.l2arc_feed_again: 1 zfs.l2arc_norw: 0 zfs.anon_size: 18432 zfs.anon_metadata_lsize: 0 zfs.anon_data_lsize: 0 zfs.mru_size: 1998799872 zfs.mru_metadata_lsize: 266654720 zfs.mru_data_lsize: 1604591616 zfs.mru_ghost_size: 2296164352 zfs.mru_ghost_metadata_lsize: 1076072448 zfs.mru_ghost_data_lsize: 1220091904 zfs.mfu_size: 1158606848 zfs.mfu_metadata_lsize: 155009536 zfs.mfu_data_lsize: 974876160 zfs.mfu_ghost_size: 1888160768 zfs.mfu_ghost_metadata_lsize: 259149824 zfs.mfu_ghost_data_lsize: 1629010944 zfs.l2c_only_size: 55236383232 zfs.vnop_osx_debug: 0 zfs.vnop_ignore_negatives: 0 zfs.vnop_ignore_positives: 0 zfs.vnop_create_negatives: 1 zfs.reclaim_list: 0 zfs.vnop_reclaim_throttle: 33280 zfs.vfs_suspend_fs_begin_delay: 2 zfs.vfs_suspend_fs_end_delay: 2 zfs.vnop_force_formd_normalized_output: 0```