openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.6k stars 1.75k forks source link

l2arc_feed bottlenecked on CPU (and blocking other I/O) #12679

Open TPolzer opened 3 years ago

TPolzer commented 3 years ago

System information

Type Version/Name
Distribution Name Debian
Distribution Version 11 (bullseye)
Kernel Version 5.10.0-9
Architecture amd64
OpenZFS Version 2.0.3-9

Describe the problem you're observing

After recently moving all my datasets to use encryption (in place, I had ~30% space used), read and write performance are significantly degraded.

I'm not sure whether there's a single root cause, but I can at least slightly narrow down the problem when reading:

Sequential reading of a big (and incompressible) file starts at ~180 MiB/s, but is oscillating afterwards, often to less than 10 MiB/s.
I can observe that

I'm running

Watching a 'scrub' of the pool in 'zpool iostat -v2', total read throughput of the raidz2 vdev oscillates between 750 and 1100 MiB/s. I think that means the hardware is not at fault.

Describe how to reproduce the problem

Not sure, I've been running this pool for a while without issues (previously, sequential read performance was >400MiB/s), but only recently decided to use zfs encryption.

Include any warning/errors/backtraces from the system logs

None.

TPolzer commented 3 years ago

I can indeed confirm that I don't observe similar behavior on an unencrypted dataset on the same pool.

AttilaFueloep commented 3 years ago

Most likely a duplicate of #10846

TPolzer commented 3 years ago

That issue sounds very relevant, yes. Though I don't quite understand how encryption being slow leads to l2arc_feed blocking reads to an abysmal level of throughput?

AttilaFueloep commented 3 years ago

Neither do I (https://github.com/openzfs/zfs/issues/10846#issuecomment-770214427 ff.), but the fact that two people are observing the same thing at least provides some anecdotal evidence. I've no knowledge of the inner workings of the ARC, but if I had to guess, I'd say it has something to do with the fact that data in the ARC is unencrypted, whereas data in the L2ARC is not. So maybe the l2arc_feed thread is busy with some crypto operations and performance is bottlenecked by the performance of a single core. Can you run perf top while the l2arc_feed thread is busy? That would shed some light on where the time is spent.

TPolzer commented 3 years ago

It seems your assumption is correct:

Samples: 61K of event 'cycles', 4000 Hz, Event count (approx.): 25826080253 lost: 0/0 drop: 0/0
Overhead  Shared Object          Symbol
  29.35%  [kernel]               [k] gcm_pclmulqdq_mul
  16.28%  [kernel]               [k] kfpu_end
  12.60%  [kernel]               [k] kfpu_begin
   7.09%  [kernel]               [k] aes_encrypt_intel
   5.91%  [kernel]               [k] mutex_spin_on_owner
   4.36%  [kernel]               [k] gcm_mul_pclmulqdq
   1.90%  [kernel]               [k] aes_xor_block
   1.50%  [kernel]               [k] clear_page_erms
   1.33%  [kernel]               [k] aes_encrypt_block
   1.30%  [kernel]               [k] gcm_mode_encrypt_contiguous_blocks
   1.27%  [kernel]               [k] osq_lock
   0.98%  [kernel]               [k] gcm_decrypt_final
   0.84%  [kernel]               [k] aes_aesni_encrypt
   0.69%  [kernel]               [k] memcpy_erms
   0.56%  [kernel]               [k] crypto_get_ptrs
   0.52%  [kernel]               [k] fletcher_4_sse2_native
   0.49%  [kernel]               [k] memmove
   0.47%  [kernel]               [k] module_get_kallsym
   0.34%  [kernel]               [k] __x86_indirect_thunk_rax
   0.29%  [kernel]               [k] copy_user_enhanced_fast_string
   0.24%  [kernel]               [k] number
For a higher level overview, try: perf top --sort comm,dso

Is it intentional that reads can block on L2ARC writes? That seems like it could be a performance bottleneck in more than just this scenario here. Also, I had assumed that sequential reads wouldn't actually be written to L2ARC cache?

AttilaFueloep commented 3 years ago

Yeah, at first sight it looks like the data is re-encrypted while feeding the L2ARC.

Is it intentional that reads can block on L2ARC writes? That seems like it could be a performance bottleneck in more than just this scenario here. Also, I had assumed that sequential reads wouldn't actually be written to L2ARC cache?

These are all good questions I unfortunately can't give an answer for. Maybe @gamanakis can help out here.

gamanakis commented 2 years ago

As far as I remember, L2ARC stores the buffers identically to the on-disk format, meaning compressed and encrypted. The l2arc_feed thread doesn't do encryption/compression on its own for the buffers per se.

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

TPolzer commented 1 year ago

I think this is still generally relevant. Though given that this was not the only issue I observed with zfs native encryption (other performance issues, some crashes), I personally reverted back to handing LUKS encrypted volumes to ZFS.