Closed pnevyk closed 6 months ago
Thanks for the report! If you want to look into this you can pass --emit llvm-ir
to rustc when compiling. Despite what the name suggests this will emit clif ir. You can find it in the output directory in the <crate_name>.clif
directory. For every function there are three files with .unopt.clif
, .opt.clif
and .vcode
as extensions. These are the clif ir before optimizations, after optimizations, the vcode (very close to the final assembly, but jump threading happens during final emission of object code and some instructions in vcode expand to multiple real instructions.) respectively.
The codegen for the simd_scatter
intrinsic can be found at https://github.com/rust-lang/rustc_codegen_cranelift/blob/289a27403645d95922c353e0953954ba00ec70cf/src/intrinsics/simd.rs#L1090-L1117
I can take a look at this too tomorrow.
I made some progress in investigation. I reduced the reproducer to this:
#![feature(portable_simd)]
use std::simd::{Mask, Simd};
fn main() {
let mut vec = [0; 1];
let vals = Simd::from_array([1]);
unsafe {
vals.scatter_select_ptr(Simd::from_array([vec.as_mut_ptr()]), Mask::splat(true));
}
assert_eq!(vec, [1]);
}
Here is the clif for scatter_select_ptr
:
If I didn't make a mistake in my clif analysis, the problem is that the generated code confuses enable
mask and vals
arguments (see my annotations):
; (pnevyk): vals, dest, enable
function u0:21(i64, i64, i64) system_v {
; ...
; stack _4 std::simd::Simd<isize, 1_usize> 8b 8, 8 storage=ss0
; stack _5 core::core_simd::masks::mask_impl::Mask<isize, 1_usize> 8b 8, 8 storage=ss1
ss0 = explicit_slot 16
ss1 = explicit_slot 16
block0(v0: i64, v1: i64, v2: i64):
; ...
; _0 = core::core_simd::intrinsics::simd_scatter::<std::simd::Simd<T, N>, std::simd::Simd<*mut T, N>, std::simd::Simd<isize, N>>(move _1, move _2, move _4)
; (pnevyk): v8 = enable[0]
v8 = stack_load.i64 ss0
; (pnevyk): v9 = dest[0]
v9 = load.i64 notrap v1
; (pnevyk): v10 = vals[0]
v10 = load.i32 notrap v0
; (pnevyk): if vals[0] then jump block3 else jump block4
brif v10, block3, block4
block3:
; (pnevyk): store enable[0] to dest[0]
store.i64 notrap aligned v8, v9
jump block4
This is in agreement with the behavior of storing -1 (i.e., true
in Mask
) instead of the value from vals
. If I initialize vals
with [0]
, then the brif
is false and the array is untouched.
However, the implementation of the simd_scatter
intrinsic looks correct to me.
For testing I am still using the version of cranelift distributed via rustup in version rustc 1.77.0-nightly (d6d7a9386 2023-12-22)
. I will try to build the cranelift backend on my machine and debug further.
rustc_codegen_cranelift
built from source works correctly. Also the clif generated code is correct:
; _0 = core::core_simd::intrinsics::simd_scatter::<std::simd::Simd<T, N>, std::simd::Simd<*mut T, N>, std::simd::Simd<isize, N>>(move _1, move _2, move _4)
; (pnevyk): v8 = vals[0]
v8 = load.i32 notrap v0
; (pnevyk): v9 = dest[0]
v9 = load.i64 notrap v1
; (pnevyk): v10 = enable[0]
v10 = stack_load.i64 ss0
brif v10, block3, block4
block3:
; (pnevyk): store vals[0] to dest[0]
store.i32 notrap aligned v8, v9
jump block4
https://github.com/rust-lang/rustc_codegen_cranelift/commit/8ab225df8b2713324acb16e91b9c80b63c5ba411 flipped these arguments, but in https://github.com/rust-lang/rustc_codegen_cranelift/commit/ace694cf834972035ce7269a078a275863fc8f9f I believe I had to flip them back because of test failures [^1]. Both commits should be included in nightly-2023-12-22 due to a subtree sync on the 19th, but somehow only the first commit shows up in the git log of the rust repo.
but somehow only the first commit shows up in the git log of the rust repo.
This may have the same root cause as https://github.com/rust-lang/rustc_codegen_cranelift/issues/1385.
https://github.com/rust-lang/rust/pull/119278 should fix this.
Should be fixed in the next nightly.
@pnevyk Can you confirm that the issue is fixed with the latest nightly?
Can you confirm that the issue is fixed with the latest nightly?
Yes, it works :tada:
The example for Simd::scatter
results in
Sort of a minimal reproducer is the following code:
which results in
(Interestingly, the second example does not cause
free(): invalid pointer
error.)Running the examples with standard LLVM backend works as expected.
If I can provide more information I will happily do so. I tried to get generated assembly using
cargo-show-asm
, but I gotLooking at the actual outputs, it seems that the problem manifests itself by
scatter
putting value -1 at indexi
andi + 1
for every (valid)i
from theidxs
vector.I would be interested in trying to fix this bug if I get some pointers to where to start. I have some basic knowledge of compilers in general and cranelift in particular.