openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.6k stars 1.75k forks source link

arc_prune causes 8K file random read performance to decrease #14826

Open lenghenglong opened 1 year ago

lenghenglong commented 1 year ago

Linux Kernel:4.19.190 CentOS 8 ZFS Version:zfs2.1.5 Architecture:x86-64 When I used vdbench to test the 8K small file performance of zfs, I found that when zfs_arc_max=0 (128G) is 100% read, when the arc cache displayed by arcstat does not reach 128G, ops(100000 ~ 200000 ) will increase with the increase of arc cache, and maintain at around 150,000. When the arc cache reaches 128G, arc arc reclaim occurs at this time, and the ops value often drops by 60% or more (when the performance drops, zpool iostat observes that the disk_wait of the data lun will become larger 600us->1~3ms), and the ops is between 100000 and 190000 fluctuation. vdbench screenshot: image

When I set zfs_arc_max=192G and restarted the test, I observed that the maximum value of the arc cache displayed by arcstat was 171G, and arc reclaim did not occur during the entire test (judged by the fact that the arc_prune count did not increase), and the ops during the entire test was very stable and maintained Around 200000. So I set zfs_arc_max=171G and started the test again. The ops during the whole test was very stable, only 2 times the ops dropped by 50%. These two time points happened to be the two time points when the arc_prune count increased during the entire test process, so I thought it was arc-related recycling that caused the ops performance to drop. But when I set zfs_arc_max=64G, there is always arc recycling, but the ops is very stable, and there is no situation where the ops drops a lot. Can someone tell me what is causing this. When zfs_arc_max=64G and 128G, I used the ftrace embedded in the kernel to track arc_evict, and found that many arc_hdr_free_abd (10~30us) will be called in one arc_evict_state (750161us) function call. Because enabling trace will seriously degrade performance, this conclusion can only be used as a reference, and I hope it will be useful. If anyone needs more info, I'll do my best to provide it

amotin commented 1 year ago

arc_prune is actually not a normal ARC eviction, but eviction of OS vnode caches. It should be called under significant metadata pressure. It was significantly reworked in upcoming ZFS 2.2. But normal ARC eviction is also not so easy task and it is single-threaded now, so if block sizes are very small, like mentioned 8KB, it may create performance bottleneck. Don't use 8KB blocks, unless you really have to.

lenghenglong commented 1 year ago

arc_prune is actually not a normal ARC eviction, but eviction of OS vnode caches. It should be called under significant metadata pressure. It was significantly reworked in upcoming ZFS 2.2. But normal ARC eviction is also not so easy task and it is single-threaded now, so if block sizes are very small, like mentioned 8KB, it may create performance bottleneck. Don't use 8KB blocks, unless you really have to.

I am very sorry to see your reply so late, does the change to arc in zfs2.2 refer to this #14359 ?or zfs2.2 will merge more arc changes?In the case of ARC eviction, the read performance of arc_max=64G is more stable than that of arc_max=128G. What do you think caused this situation? Thanks!

amotin commented 1 year ago

https://github.com/openzfs/zfs/pull/14359 in 2.2 will completely change when ARC calls pruning, assuming it is related to your performance problem. There are other ARC changes in 2.2 also.

lenghenglong commented 1 year ago

14359 in 2.2 will completely change when ARC calls pruning, assuming it is related to your performance problem. There are other ARC changes in 2.2 also.

Hope that will fix this problem.Thank you for your help !