rust-lang / rustc_codegen_cranelift

Cranelift based backend for rustc
Apache License 2.0
1.59k stars 100 forks source link

trap at runtime for `core[7eee]::core_arch::x86::sse::_mm_add_ss`: `llvm.x86.sse.add.ss` #1463

Closed futile closed 6 months ago

futile commented 6 months ago

After compiling my bevy-project with the new cranelift backend, using the nightly rustc from 2024-03-01, I get the following crash due to trap when running it:

$ RUSTFLAGS="-Zcodegen-backend=cranelift" rustup run nightly-2024-03-01-x86_64-unknown-linux-gnu cargo run
    Finished `dev` profile [optimized + debuginfo] target(s) in 0.19s
     Running `target/debug/ultra-game`
2024-03-05T23:50:26.379503Z  INFO bevy_winit::system: Creating new window "App" (0v0)
2024-03-05T23:50:26.380222Z  INFO log: Guessed window scale factor: 1
DRM kernel driver 'nvidia-drm' in use. NVK requires nouveau.
2024-03-05T23:50:26.585528Z  INFO bevy_render::renderer: AdapterInfo { name: "NVIDIA GeForce GTX 1060 6GB", vendor: 4318, device: 7171, device_type: DiscreteGpu, driver: "NVIDIA", driver_info: "550.54.14", backend: Vulkan }
trap at Instance { def: Item(DefId(2:13803 ~ core[7eee]::core_arch::x86::sse::_mm_add_ss)), args: [] } (_ZN4core9core_arch3x863sse10_mm_add_ss17h911d741fe58f9e47E): llvm.x86.sse.add.ss

Without RUSTFLAGS="-Zcodegen-backend=cranelift" it runs fine (e.g., the DRM kernel driver ...-message isn't critical).

To reproduce: Run the failing command with this repo & commit: https://github.com/futile/ultra-game/tree/831eeb43b56c1d9dbc9422d130095cd14da8e145


This cranelift backend is a great project, compile time for my project (for a full debug-build) went from ~4min to ~1min when enabling cranelift! Really cool, thanks a lot for your work! :)

bjorn3 commented 6 months ago

The _mm_add_ss intrinsic isn't yet implemented. It is used by the glam linear algebra crate which bevy uses internally. Looks like there are a fair bit of unimplemented intrinsics used by glam. I'm going to work on implementing them.

compile time for my project (for a full debug-build) went from ~4min to ~1min when enabling cranelift!

How much of a difference is it if you remove https://github.com/futile/ultra-game/blob/831eeb43b56c1d9dbc9422d130095cd14da8e145/Cargo.toml#L13-L15? Anything opt-level > 0 is kind of equivalent to opt-level = 1 with LLVM in terms of optimizations. (I haven't actually measured the runtime performance difference, but Cranelift doesn't have a lot of optimizations it supports, so it is far from as fast in terms of runtime perf as opt-level = 3 with LLVM.) If you want dependencies to be fully optimized you did have to build them with LLVM and then build just your own code with cg_clif. For Bevy this currently doesn't work due to an ABI incompatibility though: https://github.com/rust-lang/rustc_codegen_cranelift/issues/1449

bjorn3 commented 6 months ago

Turns out there were only three intrinsics missing.

futile commented 6 months ago

Turns out there were only three intrinsics missing.

Oh wow, that was super fast, thanks a lot! :) Can I somehow test this/should I just test the next 1-2 nightly rustc versions?

bjorn3 commented 6 months ago

You can download a precompiled version from https://github.com/rust-lang/rustc_codegen_cranelift/releases/tag/dev Unpack it anywhere you like and use the cargo-clif executable inside it in the place of cargo.

It may take a couple of days before I'm able to update the version distributed with rustup.

futile commented 6 months ago

How much of a difference is it if you remove https://github.com/futile/ultra-game/blob/831eeb43b56c1d9dbc9422d130095cd14da8e145/Cargo.toml#L13-L15? Anything opt-level > 0 is kind of equivalent to opt-level = 1 with LLVM in terms of optimizations. (I haven't actually measured the runtime performance difference, but Cranelift doesn't have a lot of optimizations it supports, so it is far from as fast in terms of runtime perf as opt-level = 3 with LLVM.)

Ah good point! Yeah, running without opt-level > 0 (i.e., commenting out what you mentioned, and also opt-level = 1 for dev) changes the times to 56s with cranelift, and 65s with LLVM, so pretty much equal. Well, still ~10% faster, but much less of a difference than before :sweat:

If you want dependencies to be fully optimized you did have to build them with LLVM and then build just your own code with cg_clif. For Bevy this currently doesn't work due to an ABI incompatibility though: #1449

Cool, thanks for the tip, will keep it in mind & subscribed! :)

bjorn3 commented 6 months ago

Can I somehow test this/should I just test the next 1-2 nightly rustc versions?

The fix will be available on the next nightly.