Closed kernelOfTruth closed 9 years ago
Are you by any chance rsyncing over NFS? I've had similar problems, NFS seems to use some caches in SLAB and pagecache, which pressures out the ARC. My workaround was to set up a 5-minute cron with
echo 1 > /proc/sys/vm/drop_caches
no, only via USB 3.0 :(
I had pasted
echo 3 > /proc/sys/vm/drop_caches
then it would free up SUnreclaim in 20-50 KiB steps, but it partially looked like memory was also growing (I don't have time to wait 5+ days for it to be usable again so)
after 1 hour I did a reboot (via magic sysrq key) - I've the impression that there's still issues with memory pressure despite usage of all the recent stuff and #2129
I've done several rsync transfers of the 2TB (albeit incremental - so at best 10-30 GB max per import and export)
despite now using a pre-set value for ARC
echo "0x100000000" > /sys/module/zfs/parameters/zfs_arc_max
echo "0x100000000" > /sys/module/zfs/parameters/zfs_arc_min`
memory keeps on growing
anybody can shed a light on what the problem with transparent hugepages and ZFSonLinux is ?
https://groups.google.com/a/zfsonlinux.org/forum/#!msg/zfs-discuss/7a77qQcG4C0/Bpc-VHKSjycJ
the advice keeps on popping up to disable it - when searching for solutions of an ever-growing ARC or ZFS slabs
what is causing SPL or ZFS, ARC to continually grow ?
Slab: 17339664 kB
SReclaimable: 1035176 kB
SUnreclaim: 16304488 kB
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
8609728 7768306 90% 0.06K 134527 64 538108K kmalloc-64
5660980 5660980 100% 0.30K 217730 26 1741840K dmu_buf_impl_t
5369568 5367594 99% 0.50K 167799 32 2684784K zio_buf_512
5360112 5358715 99% 0.88K 148892 36 4764544K dnode_t
5314824 5314824 100% 0.11K 147634 36 590536K sa_cache
5282844 5282844 100% 0.19K 251564 21 1006256K dentry
755750 748071 98% 0.31K 30230 25 241840K arc_buf_hdr_t
613504 613504 100% 0.03K 4793 128 19172K kmalloc-32
563262 563262 100% 0.09K 13411 42 53644K kmalloc-96
439110 224047 51% 0.08K 8610 51 34440K uksm_rmap_item
427986 311810 72% 0.10K 10974 39 43896K arc_buf_t
313242 312955 99% 0.04K 3071 102 12284K l2arc_buf_hdr_t
284608 188966 66% 0.06K 4447 64 17788K uksm_tree_node
245358 243018 99% 16.00K 122679 2 3925728K zio_buf_16384
170520 170471 99% 8.00K 42630 4 1364160K kmalloc-8192
140028 110041 78% 0.19K 6668 21 26672K kmalloc-192
122880 122880 100% 0.01K 240 512 960K kmalloc-8
62080 61604 99% 0.12K 1940 32 7760K kmalloc-128
38590 38590 100% 0.12K 1135 34 4540K kernfs_node_cache
34496 34118 98% 0.18K 1568 22 6272K vm_area_struct
34192 22220 64% 2.00K 2137 16 68384K kmalloc-2048
32186 31444 97% 0.18K 1463 22 5852K uksm_vma_slot
30912 29510 95% 0.06K 483 64 1932K anon_vma_chain
18176 18176 100% 0.02K 71 256 284K kmalloc-16
16256 11372 69% 0.06K 254 64 1016K range_seg_cache
16065 15472 96% 0.08K 315 51 1260K anon_vma
14784 14784 100% 0.06K 231 64 924K iommu_iova
13824 13760 99% 0.50K 432 32 6912K kmalloc-512
13224 13224 100% 0.54K 456 29 7296K inode_cache
12448 7993 64% 0.25K 389 32 3112K kmalloc-256
12036 11095 92% 0.04K 118 102 472K uksm_node_vma
11730 10908 92% 0.08K 230 51 920K btrfs_extent_state
11424 9506 83% 0.57K 408 28 6528K radix_tree_node
10725 10028 93% 0.96K 325 33 10400K btrfs_inode
10444 10444 100% 0.14K 373 28 1492K btrfs_extent_map
10112 9599 94% 0.25K 316 32 2528K filp
9600 6754 70% 1.00K 300 32 9600K zio_buf_1024
8832 8283 93% 1.00K 276 32 8832K kmalloc-1024
this is after a short uptime of 12 hours, one zpool scrub of an 2.77 TB pool of data (1.72 TB partially with ditto blocks),
then 2 rsync transfers (currently on the 2nd incremental run)
I've read that exporting pools is supposed to reset memory consumption
but how can that be the solution ?
assumed - programs or the workstate of them has to be preserved it's not possible to export and re-import that pool, where my /home partition resides (mirrored, backed by a small l2arc of now 50 GB; 1.72 TB)
@behlendorf , @tuxoko you two are the memory and/or ARC "cracks" in regard to ZFSonLinux: do you have any suggestions what settings or approach might improve things ?
Meanwhile I keep on looking for reports of experience and settings that might have helped in dealing with this problem (besides disabling THP)
Sorry for bothering in any case - I just want to prevent repeating an experience similar to https://github.com/zfsonlinux/zfs/issues/3142
Many thanks in advance !
@kernelOfTruth I've been meaning to give this issue a more serious looking into but haven't yet had a chance to do so. Looking at the numbers from your initial posting, this sticks out:
other_size 4 4028910448
That's pretty much blowing your 4GiB ARC size limit right there. This value is the sum a few other things, the sizes of which aren't necessarily readily available. It contains the dnode cache "dnode_t", the dbuf cache "dmu_buf_impl_t" and some of the 512-byte zio buffer "zio_buf_512".
Here are the relevant slab lines from above:
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
dnode_t 2608993 4033260 896 36 8 : tunables 0 0 0 : slabdata 112035 112035 0
dmu_buf_impl_t 2765075 5395338 312 26 2 : tunables 0 0 0 : slabdata 207513 207513 0
zio_buf_512 2612411 4143872 512 32 4 : tunables 0 0 0 : slabdata 129496 129496 0
As you can see, they've all got a ton of items. Also, they're all somewhat sparsely populated which likely means there's a fair bit of slab fragmentation.
The common thread in these related problems seems to be the use of rsync
which, of course, traverses directory structures and requires all the inode information for every file and directory. I have a feeling the culprit is this other kernel-related slab cache:
dentry 2535755 2887290 192 21 1 : tunables 0 0 0 : slabdata 137490 137490 0
AFAIK right now, the only way to tame the kernel's dentry cache is to set /proc/sys/vm/vfs_cache_pressure
to a value > 100.
At the point the values look like the above, I'm not sure what can be done to lower them and to reduce any fragmentation which might have occurred.
No, it deals with ARC buffers.
@dweeezil , @kpande , @snajpa thanks for taking a look into this !
/proc/sys/vm/vfs_cache_pressure previously was at 1000 ( #3142 )
for this issue it's at 10000, ok so I'll raise it again one step
Related to vfs_cache_pressure settings what stuck out in my memory was something Andrew Morton (?) wrote: having it set at >= 100000
I'll give that a try, thanks !
Wandering around the web with this issue in mind I found following information:
Questions - for improving memory reclaiming and/or growth limiting:
What is the "optimal" setting for spl_kmem_cache_expire in this connection ? 2 == by low memory conditions [was set at that all the time] 1 == by age
What is the "optimal" setting for spl_kmem_cache_reclaim ? code says:
unsigned int spl_kmem_cache_reclaim = 0 /* KMC_RECLAIM_ONCE */;
so 0 == reclaim once and then no more ?
MODULE_PARM_DESC(spl_kmem_cache_reclaim, "Single reclaim pass (0x1)");
so 1 = reclaim once ? this one's confusion :/
it should reclaim, I wouldn't care if latency was a little higher - as long as memory growth doesn't get out of control
I'm irregularly running memory compaction manually but it might not address this kind of fragmentation issue, I'll take a look and see what can be tweaked in that regard
I'll give @snajpa 's suggestion of
echo 1 > /proc/sys/vm/drop_caches
and see if that works here
Thanks !
setting spl_kmem_alloc_max to 65536 per #3041 (default: 2097152)
something's fishy:
even though
echo "0x100000000" > /sys/module/zfs/parameters/zfs_arc_max
echo "0x100000000" > /sys/module/zfs/parameters/zfs_arc_min
echo "6442450944" > /sys/module/zfs/parameters/zfs_arc_meta_limit
are set
and it could be observed that the settings seems to apply on a per-zpool basis
(is that true ?)
after a scrub of an additional pool and now after the export of the mentioned pool (only the pool containing /home is currently imported)
settings are now at:
arc_meta_used 4 6448093560
arc_meta_limit 4 12632603136
arc_meta_max 4 12653024936
I should have copied arc_meta_max and arc_meta_limit before
but I'm sure the values were significantly lower for at least one value (arc_meta_max ? at 6 GB ?)
are those values constantly rising with each subsequent and/or imported pool and after export not reset ?
also SUnreclaim was at a value of around 18-20 GB
well, I would understand if it was around 14-15 GB but that is three times the value of 6 GB
weird ...
copying arcstats for good measure
values should be for /home + l2arc, after import & export of one additional pool (2.7 TB, 1.7 TB with ditto blocks), rsync to that additional pool, after rsync to a btrfs volume (1.7 TB)
cat /proc/spl/kstat/zfs/arcstats
5 1 0x01 86 4128 20141264769 44155010516587
name type data
hits 4 46142768
misses 4 1620385
demand_data_hits 4 435736
demand_data_misses 4 34844
demand_metadata_hits 4 25703873
demand_metadata_misses 4 1036994
prefetch_data_hits 4 1157
prefetch_data_misses 4 48027
prefetch_metadata_hits 4 20002002
prefetch_metadata_misses 4 500520
mru_hits 4 8253358
mru_ghost_hits 4 221404
mfu_hits 4 24659066
mfu_ghost_hits 4 40243
deleted 4 493513
recycle_miss 4 61459
mutex_miss 4 5
evict_skip 4 243854
evict_l2_cached 4 4067502592
evict_l2_eligible 4 3200269312
evict_l2_ineligible 4 6979838464
hash_elements 4 337622
hash_elements_max 4 830717
hash_collisions 4 109652
hash_chains 4 13556
hash_chain_max 4 5
p 4 9516086784
c 4 11319298288
c_min 4 4194304
c_max 4 4294967296
size 4 11318676224
hdr_size 4 138155184
data_size 4 4255234560
meta_size 4 2702728192
other_size 4 4214504248
anon_size 4 153600
anon_evict_data 4 0
anon_evict_metadata 4 0
mru_size 4 3688179200
mru_evict_data 4 2307723264
mru_evict_metadata 4 467456
mru_ghost_size 4 654553600
mru_ghost_evict_data 4 642440704
mru_ghost_evict_metadata 4 12112896
mfu_size 4 3269629952
mfu_evict_data 4 1947249152
mfu_evict_metadata 4 883870208
mfu_ghost_size 4 1809937408
mfu_ghost_evict_data 4 1809665024
mfu_ghost_evict_metadata 4 272384
l2_hits 4 143907
l2_misses 4 1476094
l2_feeds 4 44103
l2_rw_clash 4 0
l2_read_bytes 4 216845312
l2_write_bytes 4 3365597184
l2_writes_sent 4 3596
l2_writes_done 4 3596
l2_writes_error 4 0
l2_writes_hdr_miss 4 0
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_free_on_write 4 3
l2_cdata_free_on_write 4 17
l2_abort_lowmem 4 12
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 4174839296
l2_asize 4 2982881792
l2_hdr_size 4 8054040
l2_compress_successes 4 128589
l2_compress_zeros 4 0
l2_compress_failures 4 44607
memory_throttle_count 4 0
duplicate_buffers 4 0
duplicate_buffers_size 4 0
duplicate_reads 4 0
memory_direct_count 4 19
memory_indirect_count 4 0
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 68
arc_meta_used 4 7063441664
arc_meta_limit 4 12632603136
arc_meta_max 4 12653024936
Exporting all pools seems to reclaim and/or reset memory usage but that can't be the only solution
On opportunity I'll try this out without l2arc and see if that makes a change ...
...
and disabled transparent hugepages as a last resort
Without addressing a few of the specifics in @kernelOfTruth's last couple of postings, I'd like to summarize the problem: Unlike most (all?) other native Linux filesystems, ZFS carries quite a bit of baggage corresponding to the kernel's dentry cache. As of at least 302f753, ZoL is completely reliant on the Kernel's shrinker callback mechanism to shed memory. Due to the nature of Linux's dentry cache (it can grow to a lot of entries very easily) and the fact that ZFS requires a lot of metadata to be associated with each entry, the ARC can easily blow past the administrator-set limit when lots of files are being traversed. A quick peek through the kernel makes me think that vfs_cache_pressure
isn't going to be of much help.
In summary, if the kernel's dcache is large, ZFS will consume a correspondingly-large (actually, several times larger) amount of memory which will show up in arcstats as "other_size".
That all said, however, the shrinker pressure mechanism does work... to a point. If I max out the memory on a system by traversing lots of files and causing other_size to get very large, the ARC will shrink if I apply pressure from a normal userland program trying to allocate memory. The manner in which the pressure is applied is dependent on the kernel's overcommit policy and the quantity of swap space. In particular, userland programs may find it difficult to allocate memory in large chunks but the same amount may succeed if the program "nibbles" away at the memory, causing the shrinkers to engage.
I'm not sure of the best solution to this issue at the moment, but it's not unique to ZFS. There are plenty of reports around the Internet of dcache-related memory problems being caused by rsync on ext4-only systems. The difference is, however, that ext4 doesn't add it's own extras to the dcache so the effects are a lot more severs. Postings in which people are complaining about this problem usually mention vfs_cache_pressure
as a solution and, in the case of ext4, I believe it will help more.
@kernelOfTruth A bit more testing shows me that you might have better success if you set the module parameters in the modprobe
as in modprobe zfs zfs_arc_max=1073741824 ...
. It seems the changes don't "take" properly if set after the module is loaded.
@dweeezil would you please elaborate how setting arc limit run-time doesn't take properly, as you say, please? I've only seen so far that if I limit ARC to a smaller size, than it already is, it may in some cases never shrink (we run into a deadlock sooner than it has a chance to).
Internally, arc_c_max
limits arc_c
(the target ARC size). The value of arc_c
is set to arc_c_max
during module initialization and arc_c_max
is set to the value of the tunable zfs_arc_max
. If zfs_arc_max
is changed once the module is loaded, arc_c_max
is updated to the new value (in arc_adapt()
) but changes to arc_c_max
are "soft", they don't have any immediate effect but only take hold when memory pressure is applied.
There should be something in the documentation as to the difference between setting zfs_arc_max
(and likely other of the tunables) at module load time and setting them once the module has been loaded.
like mentioned in #3155 it would be nice, if we could avoid having to use two caches, dentry & dnode
anyway @dweeezil coincidentally I also made the observation that at least two settings can't be set dynamically once spl/zfs was already loaded and started to put some settings into spl.conf & zfs.conf (when the modules are loaded):
spl_kmem_cache_kmem_threads spl_kmem_cache_magazine_size
also
zfs_arc_max and zfs_arc_min
seemingly can't be set to the same value during load (and or that error coincided with a different value of spl_kmem_cache_max_size)
otherwise it would lead to lots of segmentation faults of mount
so the testing settings right now are:
zfs.conf
options zfs zfs_arc_max=0x100000000
#options zfs zfs_arc_min=0x100000000
options zfs zfs_arc_meta_limit=6442450944
spl.conf
options spl spl_kmem_cache_kmem_limit=4096
options spl spl_kmem_cache_slab_limit=16384
options spl spl_kmem_cache_magazine_size=64
options spl spl_kmem_cache_kmem_threads=8
options spl spl_kmem_cache_expire=2
#options spl spl_kmem_cache_max_size=8
options spl spl_kmem_cache_reclaim=1
options spl spl_kmem_alloc_max=65536
currently I've also have transparent hugepages disabled via
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
Thanks !
Regarding #3155, I was clearly wrong about other filesystems not hanging onto a lot of stuff in the linux slab. Here's some slabinfo entries after stat(2)ing about a million files in a 3-level nested set of 1000 directories on an EXT4 filesystem:
ext4_inode_cache active=1002656 num=1002656 size=1889.47MiB
inode_cache active=6635 num=7595 size=7.53MiB
dentry active=1010025 num=1010025 size=308.24MiB
and this is after doing the same on a ZFS file system (with an intervening drop_caches to clean everything up):
dmu_buf_impl_t active=1034979 num=1034979 size=971.24MiB
dnode_t active=1002246 num=1002246 size=2217.49MiB
zio_buf_512 active=1002336 num=1002336 size=489.42MiB
ext4_inode_cache active=1565 num=2224 size=4.19MiB
inode_cache active=6726 num=7595 size=7.53MiB
dentry active=1015115 num=1015125 size=309.79MiB
ZFS is definitely grabbing more *node-related stuff but it's not like EXT4 doesn't add on its own stuff.
@dweeezil dentry and inode(znode) are all handled by kernel VFS, so there's no reason they will behave different when using different filesystem. However, dnode and dmu_buf are handled by ZFS, and dnode should be loosedly coupled with the inode(znode), so I don't think inode(znode) would hold up on to dnode.
I wonder why ZFS doesn't reclaim more aggressively, I'd like to investigate on this but currently I'm busy on other stuff...
@tuxoko Right, I mainly wanted to point out that ZFS isn't the only filesystem that uses a lot of inode-related storage. Also, it's not clear to me that the kernel is handling large dentry cache sizes very well. Finally, I wanted to point out that ZFS can behave much better if the arc limit is set during module load rather than after the fact. For my part, I'm not going to be able to look into this much further now, either. I do plan on investigating related issues more closely as part of #3115 (speaking of which, and on a totally unrelated subject, I have a feeling it may be a major pain to merge ABD into that).
Seems like the issue is resolved (reclaim seems to work fine) - I'm not really sure what of the modified settings made that change possible but I guess that it's a combination
posting the data here for reference if anyone should encounter an ever-growing ARC:
Keep in mind that this is tailored toward a desktop, home backup and workstation - kind of setup
Kernel running 3.19 with following mentionable additional patchsets that give memory allocations a higher success chance:
swap on ZRam with LZ4 compression
/etc/modprobe.d/spl.conf
options spl spl_kmem_cache_kmem_limit=4096
options spl spl_kmem_cache_slab_limit=16384
options spl spl_kmem_cache_magazine_size=64
options spl spl_kmem_cache_kmem_threads=8
options spl spl_kmem_cache_expire=2
#options spl spl_kmem_cache_max_size=8
options spl spl_kmem_cache_reclaim=1
options spl spl_kmem_alloc_max=65536
/etc/modprobe.d/zfs.conf
#options spl spl_kmem_cache_kmem_limit=4096
#options spl spl_kmem_cache_slab_limit=16384
#options spl spl_kmem_cache_magazine_size=64
#options spl spl_kmem_cache_kmem_threads=8
#options spl spl_kmem_cache_expire=2
options zfs zfs_arc_max=0x100000000
#options zfs zfs_arc_min=0x100000000
options zfs zfs_arc_meta_limit=6442450944
options zfs zfs_arc_p_dampener_disable=0
<-- several of those parameters both for ZFS and SPL kernel modules have to be specified during loading of the modules - otherwise, behavior seems to be that those aren't adhered to
_slubnomerge is appended to the kernel due to safety reasons (buggy drivers, igb had that problem of memory corruption afaik)
_inteliommu=on appended to kernel per advice from @ryao
_CONFIG_PARAVIRTSPINLOCKS is enabled in kernel configuration, if I remember correctly there was an issue where @ryao mentioned that certain codepaths (slowpath is removed (?) with that configuration option and thus lockups tend to occur less often. https://github.com/zfsonlinux/zfs/issues/3091
cat /proc/sys/vm/vfs_cache_pressure
100000
Disabling THP - transparent hugepages - which seems to work fine with the recent tweaks to ZFS, though
and regularly running
echo 1 > /proc/sys/vm/compact_memory
might raise stability in certain cases (if I remember correctly it was also mentioned related to OpenVZ)
echo "786432" > /proc/sys/vm/min_free_kbytes
echo "65536" > /proc/sys/vm/mmap_min_addr
is also set here as a preventative & stability enhancing measure (might need adapting, it's tailored towards 32GB of RAM)
Code-changes & commits: https://github.com/kernelOfTruth/zfs/commit/8135db50038f9fdfff4c9d54b63433db2b34ff97 from #3181 with raised value to 12500
https://github.com/kernelOfTruth/zfs/commit/fa8f5cd8e846c65ca203c5b29d43077ee3bc1c40 higher ZFS_OBJ_MTX_SZ (512 ; double the value) which leads to following error messages during mount/import: http://pastebin.com/cWm5Hvn0 but works fine in operation
https://github.com/kernelOfTruth/zfs/commit/086f234d7ee426686b79c5ce9e1a039d840f29ef _arc_evictiterations to 180 _zfs_arc_growretry to 20 _zfs_arc_shrinkshift to 4 (but just saw that manually I was still setting it to 5)
So ARC doesn't grow that aggressively and more objects at the same time are scanned through and recycled or evicted.
It might not address SUnreclaim directly but changes in #3115 should refine ARC's behavior in that regard (arc_evict_iterations replaced with zfs_arc_evict_batch_limit)
Codes changes & commits in SPL: https://github.com/zfsonlinux/spl/pull/372 With the changes "breaking" it reverted: Retire spl_module_init()/spl_module_fini() https://github.com/kernelOfTruth/spl/commit/ee4bd8bbb232eb98c1a6f447f8c591421ce4dfee , until the pull-request is updated - https://github.com/kernelOfTruth/spl/commit/06dd9ccb98a57e9b9d8ae6de3b4c9c086bbb29f9
Additional manually set settings: _zfs_arc_shrinkshift to 5 (will try between 4 and 5 in the future and see if that raises latency) _spl_taskq_threadbind to 1 _zfs_prefetchdisable to 0 (if disabled ("1") might hurt performance (read), lead to way more reads - a lot ; lower latency though with ("1"))
Below follow the output of /proc/slabinfo, cat /proc/meminfo and /proc/spl/kstat/zfs/arcstats during the restore operation of 1.7 TB from external USB 3.0 disk (both were ZFS pools)
ZFS ARC stats, 1 TB in, mainly "larger" (hundreds MB to GB): http://pastebin.com/uASLYsqW (close to beginning, a few small files, others several hundred MB, gigabytes and mostly close to 10 MB)
ZFS ARC stats, 1.3 TB, more larger data: http://pastebin.com/Fi5CMc65 data
ZFS ARC stats, 1.6 TB, mixed (large + little data): http://pastebin.com/uDYUuBGY
ZFS ARC stats, 1.7 TB, heavily mixed, close to end of backup: http://pastebin.com/tHHT1cXX
ZFS ARC stats, 1.7 TB, heavily mixed, after rsync: http://pastebin.com/BEmqGFQX
Mark how _othersize doesn't seem to grow out of proportion anymore; only swap used during backup was from ZRAM
Will post the stats later of several imports and exports of pools + Btrfs partitions and small incremental rsync updates - this was always a problem in the past where SUnreclaim would grow almost unstoppably ever larger
So here the data after:
rsync (1.7 TB) - ZFS /home to ZFS bak (several hundred megabytes transferred) stage4 (system backup from Btrfs partition to ZFS; 7z) updatedb (system [Btrfs] + ZFS /home partition) 2x rsync (1.7 TB) - ZFS /home to ZFS bak (5-10 GB transferred)
other_size has twice the size of data_size, meta_size close the size of data_size
zio_buf_16384 never was blown out of proportion (e.g. 4 GB) and had always a value around 1-1.3 GB dnode_t also had a size of around 1 GB
Will post the data after updatedb with /home + another additional ZFS pool imported and an rsync job after that - this was usually the worst-case scenario for me in the near past where things really seemed to wreak havoc (despite using #2129 )
If things don't change I'll re-add the l2arc device and see how things goes in the next days - with it memory consumption always was greater (improved with #3115 ?) ; but even without those changes it should behave way more civil with an L2ARC
ok, decided to add l2arc
cache - - - - - -
intelSSD180 1.27G 57.3G 46 5 102K 563K
updatedb with additional imported pool, then another rsync: http://pastebin.com/DgxWtjR5
after export of the additional pool zio_buf_16384 now even went down to 553472K - otherwise it would only ever grow;
dnote_t is at around 1156928K
SUnreclaim: 3544568 kB
Seems like everything works as intended now :+1:
@kernelOfTruth lucky you, my SUnreclaim still just keeps on growing. But unlike you I'm stuck with RHEL6 kernel and can't move on to anything newer (OpenVZ).
Thanks :)
@snajpa that's unfortunate :/ Does the support contract allow compiling a different kernel out of the sources - as long as you're staying on that version ? (I'm eye-balling towards the Paravirt stability issues since you also mentioned lockup problems in #3160 and that RHEL6 kernels aren't compiled with it)
I also just recently added that support since I only use virtualbox for virtualization purposes
From what I read there seem to be at least 2 significant landmarks: 3.10 (RHEL7 seems to contain it), 3.12 were some locking & dentry-handling changes also were introduced (http://permalink.gmane.org/gmane.linux.kernel.commits.head/407250)
Experimenting with all of the options I summarized above would be possible ? Like I wrote, I'm not sure what change exactly made it "click" to work properly - but it would be surely nice if it wasn't too kernel-version dependent - 2.6.32, like in some of the issues mentioned, perhaps would be too old though
Anyway: Good luck - if it can be made to work here, I'm sure you'll also figure it out, I don't have that much knowledge or expertise in the kernel or code department compared to you guys, I'm sure (doing this as a mere hobby & from experience, Gentoo user)
edit:
I remember having read that
Disabling THP - transparent hugepages - which seems to work fine with the recent tweaks to ZFS, though
and regularly running
echo 1 > /proc/sys/vm/compact_memory
might raise stability in certain cases (if I remember correctly it was also mentioned related to OpenVZ)
echo "786432" > /proc/sys/vm/min_free_kbytes
echo "65536" > /proc/sys/vm/mmap_min_addr
is also set here as a preventative & stability enhancing measure (might need adapting, it's tailored towards 32GB of RAM)
echo "bfq" > /sys/module/zfs/parameters/zfs_vdev_scheduler
(default: noop)
and BFQ is also set in
/sys/block/sd*/queue/scheduler
where that isn't supported, CFQ could make a change related to latency or perhaps even stability - experimenting between noop, deadline & cfq
perhaps that also might be of help
With recent upstream changes in master ( #3202 ) - this doesn't seem to appear anymore
but it sure still needs a few days (or weeks+) of testing
appears to be fixed - therefore closing.
@kernelOfTruth Excellent news, thanks for the update.
Posting the data here before the system goes "boom" (it's getting slower and slower) - hope it's useful
Symptoms: opening chromium, firefox, konqueror, etc. takes several seconds to load
besides that the system is (still) working fine
I'm not really sure to agree that SUnreclaim should be that huge
should spl_kmem_cache_reclaim be something else ?
_slubnomerge is used during bootup
Below is following output of system - no suspicious output of dmesg
will attempt "echo 3 > /proc/sys/vm/drop_caches" and see how it goes ...