rust-lang / rustc_codegen_cranelift

Cranelift based backend for rustc
Apache License 2.0
1.52k stars 94 forks source link

SIMD trap at avx2::_mm256_sad_epu8 #1412

Closed zesterer closed 7 months ago

zesterer commented 8 months ago

Hello,

I made an attempt at building Veloren with the cranelift backend!

I'll preface this by saying that I did not expect this to work, and the fact that the thing even compiled at all is miraculous to me. Veloren is an enormous codebase nowadays that pulls in a terrifying number of dependencies that do all sorts of weird and unusual things that likely represent a headache for a codegen backend like JIT, dynamic linking, horrible multi-threading things, linking to several C and C++ codebases, a lot of SIMD (both explicit and implicit), atomics all over the place, etc.

When running the executable, I get:

trap at Instance { def: Item(DefId(2:14641 ~ core[b9f2]::core_arch::x86::avx2::_mm256_sad_epu8)), args: [] } (_ZN4core9core_arch3x864avx215_mm256_sad_epu817h663d79696ba92f42E): llvm.x86.avx2.psad.bw

(this happens after both wgpu selects a graphics adapter and the internal server boots up, so that it got this far is impressive!)

That said, there was no warning about this intrinsic (or any warnings at all, for that manner) reported during the build process, despite this post implying that there should be.

Hopefully this is useful information!

bjorn3 commented 8 months ago

That said, there was no warning about this intrinsic (or any warnings at all, for that manner) reported during the build process, despite this post implying that there should be.

I think something somewhere is suppressing those warnings. Maybe it is cargo, maybe it is rustc when --cap-lints allow is passed? Haven't investigated yet.

When running the executable, I get:

trap at Instance { def: Item(DefId(2:14641 ~ core[b9f2]::core_arch::x86::avx2::_mm256_sad_epu8)), args: [] } (_ZN4core9core_arch3x864avx215_mm256_sad_epu817h663d79696ba92f42E): llvm.x86.avx2.psad.bw

Should be fixed on the implement_xgetbv branch now. I'm currently doing a local build of veloren to check if everything works.

bjorn3 commented 8 months ago

Looks like there is another image decoding issue:

trap at Instance { def: Item(DefId(2:13926 ~ core[90bc]::core_arch::x86::ssse3::_mm_mulhrs_epi16)), args: [] } (_ZN4core9core_arch3x865ssse316_mm_mulhrs_epi1617hcd9ec8a636ca4408E): llvm.x86.ssse3.pmul.hr.sw.128
zesterer commented 8 months ago

Did you want me to look at further into this when I get time?

bjorn3 commented 8 months ago

That is not necessary. I know what the issue is (another unimplemented intrinsic), I am working on implementing it.

bjorn3 commented 7 months ago

Progress update: I got to the login screen and after I tried to login it crashed with

trap at Instance { def: Item(DefId(2:14310 ~ core[90bc]::core_arch::x86::avx::_mm256_lddqu_si256)), args: [] } (_ZN4core9core_arch3x863avx18_mm256_lddqu_si25617hbbd9f58f2d58f5fdE): llvm.x86.avx.ldu.dq.256

in httparse (a dependency of hyper). Going to fix that next.

zesterer commented 7 months ago

Oooh, lots of progress! Definitely keep me updated and let me know if I can lend a hand, I'd love to see this as a viable alternative to opt-level = 0 for us.

bjorn3 commented 7 months ago

If you could quickly get rid of shaderc and spirv_cross that would be great :) Implementing new intrinsics is reasonably quickly as I can test the respective crate in isolation, but recompiling the entirety of veloren once I implemented some intrinsics takes 15min on my 2 core + HT intel core i3 laptop (can't use the dev-desktop-eu-2.infra.rust-lang.org due veloren to depending on vulkan). Like half of that is spent in compiling shaderc and spriv_cross.

(Yes, I'm fully aware that you can't do this quickly, but maybe for the long term switching to wgsl using naga would be possible? Naga is already a dependency of veloren through wgpu.)

I will keep you updated!

zesterer commented 7 months ago

IIRC you can disable the shaderc-from-source feature in voxygen, it shouldn't be necessary to recompile shaderc. It's enabled by default just because rebuilds are buggy on some specific setups, but that's unlikely to be your case.

bjorn3 commented 7 months ago

screenshot_1699352450832

bjorn3 commented 7 months ago

Works with the latest nightly now.