Closed cuviper closed 7 months ago
FWIW, aarch64 also fails:
trap at Instance { def: Item(DefId(2:48675 ~ core[5761]::core_arch::aarch64::neon::generated::vfmaq_laneq_f32)), args: [0_i32] } (_ZN4core9core_arch7aarch644neon9generated15vfmaq_laneq_f3217h0dd8a28605cc03d6E): llvm.fma.v4f32
trap at Instance { def: Item(DefId(2:48683 ~ core[5761]::core_arch::aarch64::neon::generated::vfmaq_laneq_f64)), args: [0_i32] } (_ZN4core9core_arch7aarch644neon9generated15vfmaq_laneq_f6417ha68b5ef2dcd31872E): llvm.fma.v2f64
Directly compiling matrixmultiply
shows warnings about these intrinsics, but at least there are no more.
Aarch64:
warning: unsupported llvm intrinsic llvm.fma.v4f32; replacing with trap
warning: unsupported llvm intrinsic llvm.fma.v2f64; replacing with trap
x86_64:
warning: unsupported x86 llvm intrinsic llvm.x86.avx.vperm2f128.pd.256; replacing with trap
warning: unsupported x86 llvm intrinsic llvm.x86.avx.vperm2f128.ps.256; replacing with trap
Implemented llvm.fma.v*
in https://github.com/rust-lang/rustc_codegen_cranelift/commit/48ca2d9703742149aa33b3f84ae933d063213d19. On AArch64 with this fix the only remaining ndarray test failures are: insert_axis
, insert_axis_f
and test_multislice_intersecting
. Based on the panic message for those remaining test failures I think there is a miscompilation of those tests though.
Edit: Seems those are actually tests that use catch_unwind
, which doesn't work because of panic=abort
.
I wrote an entire comment about how I couldn't reproduce any crash on x86 and then I tried using the rustup version instead of the version built from this repo, which did indeed crash with this error message. I'm currently investigating what the difference between the two is that could have caused this.
Ah, yes I'm using the rustup component, as of:
$ rustc +nightly -Vv
rustc 1.75.0-nightly (31bc7e2c4 2023-10-30)
binary: rustc
commit-hash: 31bc7e2c47e82798a392c770611975a6883132c8
commit-date: 2023-10-30
host: x86_64-unknown-linux-gnu
release: 1.75.0-nightly
LLVM version: 17.0.3
It seems like is_x86_feature_detected!()
is broken when using a cg_clif compiled libstd, causing matrixmultiply
to disable some tests because it thinks AVX and FMA are not supported.
I think I know the issue. std_detect::detect::os::x86::detect_features
depends on _xgetbv()
to see if the OS supports AVX. _xgetbv
is implemented using the llvm.x86.xgetbv
LLVM intrinsic rather than an asm!()
block. Because it isn't supported natively by Cranelift, I implemented it using a dummy value of 1.
Just a quick update. I have _xgetbv
correctly implemented now. I've been working on implementing _mm256_permute2f128_ps
and _mm256_permute2f128_pd
and got a miscompilation right now that I need to fix.
Got matrixmultiply working correctly in the implement_xgetbv branch. You can download a precompiled version from https://github.com/rust-lang/rustc_codegen_cranelift/actions/runs/6763047493 once it is done. I will probably work on implementing the rest of the reported missing intrinsics from other issues before opening a PR.
Should be fixed in the latest nightly.
Confirmed, thanks!
I have some code using
ndarray
dot products, which in turn callsmatrixmultiply::sgemm
ordgemm
, and these trap when built with cranelift. Here's a reproducer: