Closed remingtonc closed 1 year ago
(related tickets moved to description)
More graphs - this shows sawtooth trend on first server this occurred on. Second server demonstrated very similar memory pattern (currently detailed).
This second graph shows current server, where ARC size was increased.
Issue happened again. At least the issue is repeatable! Stopping the NFS server results in the ARC reducing in size - so this seems like contention somewhere. The write load is much higher than the read load on these machines, is the ARC used in anything other than read? If NFS is somehow pinning ZFS information in ARC, it would be great to understand how that confluence would work and why it would flatline the ability to write data..and even read! Can the L2ARC create a stall condition somehow as well?
I've seen this happen too when I drop caches on a loaded NFS server. Only way to recover seems to be a stop/start of nfs-kernel-server.
I wonder if #13231 is related and if increasing zfs_arc_prune_task_threads would help...?
I have the same issue, though I'm not using NFS server. My basic setup / load (on zvols+xfs) is as follows:
zpool create tank -f -o ashift=12 -o autotrim=off -O relatime=on -O xattr=sa -O dnodesize=auto -O compression=lz4 [56 mirrored pairs of 16TB SAS HDDs] spare [8 spare 16TB SAS HDDs] special [3-way mirror of enterprise NVMe SSDs] log [3-way mirror of Optane NVMe SSDs]
zfs create tank/test -o recordsize=64K
zfs create tank/test/vol -o volblocksize=64K -V 500G -s
mkfs.xfs -m crc=1,reflink=1 -f -K -d su=64k,sw=1 -size=4k /dev/zvol/tank/test/vol
mount /dev/zvol/tank/test/vol /mnt
cd /mnt
fio --runtime=3600 --direct=1 --bsrange=12k/5:48k/50:132k/20:216k/25 --ioengine=libaio --time_based --name=test --iodepth=96 --rw=randread
I'm on Ubuntu 22.04 with OpenZFS 2.1.2. I've tried to bump zfs_arc_prune_task_threads to 2, 4, and 8, with no improvement.
Super happy to run test cases / debugging if anyone has suggestions!
Thought this was kind of interesting - it's possible that I'm misunderstanding arc_summary output, but when I get the arc_evict 100% cpu condition, the numbers don't really add up. Example follows - ARC is at 65.1 GiB, but MFU+MRU+Meta+Dnode = ~12GiB.
ARC size (current): 101.7 % 65.1 GiB
Target size (adaptive): 100.0 % 64.0 GiB
Min size (hard limit): 12.5 % 8.0 GiB
Max size (high water): 8:1 64.0 GiB
Most Frequently Used (MFU) cache size: 95.7 % 8.9 GiB
Most Recently Used (MRU) cache size: 4.3 % 407.7 MiB
Metadata cache size (hard limit): 75.0 % 48.0 GiB
Metadata cache size (current): 6.3 % 3.0 GiB
Dnode cache size (hard limit): 10.0 % 4.8 GiB
Dnode cache size (current): < 0.1 % 718.8 KiB
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
Ubuntu
20.04.3 LTS
5.4.0-96-generic
x86_64
zfs-2.1.2-1
Describe the problem you're observing
ZFS is flatlined on throughput with an
arc_evict
andarc_prune
process spinning at 100%. The workload is kernel NFS server (all NFSv4 clients) with ZFS 2.1.2 built from source. Characterized by high CPUiowait
and throughput flatlining.RAM Graph
It's holding on to RAM pretty hard. This is where I begin to lose debugging expertise, having discovered slabs yesterday. :-)
top
slabtop
zpool
zfs fs
Intention is for ARC/L2ARC to be entirely metadata.
initial arcstats
arcstats reported metadata usage above the limit.
initial adjustment
Attempted to remediate by increasing ARC size by half of remaining RAM and increasing the metadata allocation in the ARC.
current issue
This remediated the issue temporarily, and the prune processes stopped, but we are back! :)
Stopped NFS server and it seems to be free-ing memory albeit very slowly, many
dp_sync_taskq
processes... But removing the NFS server is very un-ideal. Given they are both living in the kernel, it's difficult for me personally to determine who is eating up the memory.zed logs
vmstat
buddyinfo
Describe how to reproduce the problem
Uncertain but it has occurred on 2 separate servers so it is likely to happen again.
Include any warning/errors/backtraces from the system logs
Related to:
6223
9966
3157
7559