openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.63k stars 1.75k forks source link

AVX2 not available for RAIDZ oder Fletcher algorithms on Ubuntu 22.04 #15223

Open koelmel opened 1 year ago

koelmel commented 1 year ago

System information

Type | Version/Name Ubuntu | 22.04 LTS Distribution Name | Ubuntu Distribution Version | 22.04 Kernel Version | 5.15.0-82-generic and 6.2.0-31-generic Architecture | x86 OpenZFS Version | 2.1.12-1 (self compiled) and 2.1.9-2ubuntu1.1

Describe the problem you're observing

After booting the new Ubuntu kernel 5.15.0-82-generic on a dedicated AMD Epyc Zen3 System (also with updated amd64-microcode package version 3.20191218.1ubuntu2.2 which updates the microcode version from 0xa001173 to 0xa0011d1 ) and a VM hosted on a AMD Epyc Zen3 System (an openSUSE 15.4 system with not updated kernel and microcode package) i recognized that AVX2 is not anymore available in RAIDZ or Fletcher algorithms. Because of the not "recognized" AVX2 the fastest algorithms are now "ssse3". Because there was also an microcode patch for Zen3 systems i've tried on the dedicated AMD Epyc Zen3 System booting the former kernel 5.15.0-79-generic with the updated microcode package. There is AVX2 available again.

Describe how to reproduce the problem

boot Ubuntu kernel 5.15.0-82-generic on a AMD Epyc Zen3 system (e.g. AMD EPYC 7443P CPU or a VM hosted on such a system)

cat /sys/module/zfs/parameters/zfs_vdev_raidz_impl

output is: "cycle [fastest] original scalar sse2 ssse3" instead of expected "cycle [fastest] original scalar sse2 ssse3 avx2"

cat /sys/module/zcommon/parameters/zfs_fletcher_4_impl

output is: "[fastest] scalar superscalar superscalar4 sse2 ssse3" instead of expected "[fastest] scalar superscalar superscalar4 sse2 ssse3 avx2"

Include any warning/errors/backtraces from the system logs

koelmel commented 1 year ago

Also the new Ubuntu 22.04 LTS HWE kernel version 6.2.0-31-generic with the official Ubuntu zfs version 2.1.9-2ubuntu1.1 is showing the problem. Other kernel modules (like raid6) are still using AVX2 and the cpu flags have still avx and avx2.

rom4nik commented 1 year ago

I think I'm seeing the same (or very similar) issue on Debian 11 and 12 using zfs-dkms packages from contrib repos. What's interesting to me, is that current and previous (2.1.12, 2.1.11) zfs-dkms AUR packages on Arch Linux work well and list avx2 in raidz/fletcher4 impls.

On Debian I'm seeing very high (near 100%) multicore CPU usage during reads (writes too) from an encrypted dataset on RAID-Z2 pool. htop with hiding kernel threads disabled shows multiple z_rd_int_0 rows reaching 100% each. perf top produces results like below:

Overhead  Shared Object                                                   Symbol
  16.73%  [kernel]                                                        [k] gcm_pclmulqdq_mul
  11.64%  [kernel]                                                        [k] kfpu_end
   4.46%  [kernel]                                                        [k] kfpu_begin

/sys/module/zfs/parameters/zfs_vdev_raidz_impl and /sys/module/zcommon/parameters/zfs_fletcher_4_impl don't list avx2 as available, same in /proc/spl/kstat/zfs/{vdev_raidz,fletcher_4}_bench.

CPUs: Ryzen 5600G (baremetal), 5700X (baremetal) and i7-8650U (in VMs using host-passthrough as CPU model). In all cases /proc/cpuinfo contains avx2 and aes.

Distros tested:

(maybe relevant for troubleshooting ideas: #9215)

rincebrain commented 1 year ago

My suspicion, based on another report someone gave me once, was that on some systems, it wasn't correctly detecting certain newer architecture features in the compile-time checks, and so compiling them out entirely, leading to FPU functions that are using much less efficient implementations.

I'm kind of tempted to either refactor the existing Linux kfpu_begin/end to include which things it thinks are supported or expose it in /proc or something to make it easier to catch that, assuming of course it is the issue at hand...

I'll be at home after tomorrow and in a position to test these theories.

rincebrain commented 1 year ago

The problem appears to be that boot_cpu_has(X86_FEATURE_OSXSAVE) is returning 0, and the avx checks are the ones that depend on that indirectly with __ymm_enabled...

e: I suspect that this is the issue that we're seeing, so when that lands, it should go away. But that doesn't help people now, now does it...

e2: I think the above link, when that patch lands, will fix it, but if we want this to work in the interim, I don't see a good option other than parsing the feature bits ourselves or just doing what they do and unconditionally make the check pass and assume everyone wanting to check that also has more checks that would break if this wasn't actually true?

So something like

#if 0
/**
 * We can't have nice things on Linux.
 * See #15223 for why we can't use this.
 */
#if defined(X86_FEATURE_OSXSAVE)
        has_osxsave = !!this_cpu_has(X86_FEATURE_OSXSAVE);
#else
        has_osxsave = B_FALSE;
#endif
        if (!has_osxsave) {
                return (B_FALSE);
        }
#endif

in __simd_state_enabled instead of the current contents around OSXSAVE checking.

I don't think we can use this_cpu_has because that still fails on cpu0...

koelmel commented 1 year ago

With a Ubuntu 22.04 LTS test VM and CPU passthrough configuration i have seen the problem only when running it on host systems with AMD CPUs (Zen3) and not with a Intel CPU (checked with Haswell CPU E5-1630 v3). But @rom4nik seems to have the problem as well on a Intel i7-8650U. It's good that the problem will be hopefully solved in the "midterm". Maybe it would be interesting to determine, what has changed in the Ubuntu Kernel (from 5.15.0-79-generic to 5.15.0-82-generic) breaking the existing zfs AVX check.

rincebrain commented 1 year ago

I literally already linked the patch discussing the bug and the previous patch breaking it.

lsylipei commented 1 year ago

This also happen to my system. I'm ubuntu 22.04 with 5.15.0-83-generic kernel. ZFS is 2.1.5. And my cpu is xeon gold 6154. No avx2 for fletcher and raidz. And I also don't have zfs_fletcher_4_impl in /sys/module/zfs/parameters.

rom4nik commented 1 year ago

It seems that on kernel 6.1.52-1 (6.1.0-12-amd64 on Debian 12) AVX2 works again, checked on 5600G and i7-8650U.

The patch mentioned earlier has landed in stable tree at 6.1.50: https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.50

rincebrain commented 1 year ago

FWIW, Ubuntu still hasn't pulled https://github.com/torvalds/linux/commit/2c66ca3949dc701da7f4c9407f2140ae425683a5, though they pulled https://github.com/torvalds/linux/commit/b81fac906a8f in 6.2.0-30.30.

Fabian-Gruenbichler commented 1 year ago

FWIW, the next version of Proxmox kernels (6.2.16-14) will contain the cherry-picked fix (already confirmed to fix the regression, but currently still in internal testing):

https://git.proxmox.com/?p=pve-kernel.git;a=commit;h=9ba0dde971e6153a12f94e9c7a7337355ab3d0ed

also already reported on the Ubuntu side, so should be fixed there at some point in the near future as well: https://bugs.launchpad.net/bugs/2034745

lowjoel commented 1 year ago

(Un)interestingly, this actually causes owners of CPUs with AVX2 to run into #10846.

In my case, encryption+sha512 checksums+raidz2: I had a workload where a VM was downloading a Steam game, and I saw all txg syncing grind to a halt. perf top shows that there's a lot of time spent in gcm_pclmulqdq_mul and mutex_spin_on_owner. Some VMs weren't happy that all I/O started timing out and promptly crashed 😢

lsylipei commented 1 year ago

Problem fixed with the release of kernel 5.15.0-88.98 on ubuntu 22.04.

Alyssumi commented 2 months ago

It seems that on kernel 6.1.52-1 (6.1.0-12-amd64 on Debian 12) AVX2 works again, checked on 5600G and i7-8650U.

The patch mentioned earlier has landed in stable tree at 6.1.50: https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.50

I just upgraded to Debian 12, but I'm still getting the same. My CPU is G4400. Anything I'm missing?

❯ uname -r
6.1.0-23-amd64
❯ cat -p /sys/module/zfs/parameters/zfs_fletcher_4_impl
[fastest] scalar superscalar superscalar4 sse2 ssse3
❯ cat -p /sys/module/zfs/parameters/zfs_vdev_raidz_impl
cycle [fastest] original scalar sse2 ssse3
rom4nik commented 2 months ago

G4400

Intel ARK doesn't mention AVX2 support for this CPU: https://ark.intel.com/content/www/us/en/ark/compare.html?productIds=88179,124968