Open workingjubilee opened 3 years ago
The loads and stores generated by LLVM are pretty inefficient:
#![feature(platform_intrinsics)]
#![feature(repr_simd)]
#[derive(Copy,Clone)]
#[repr(simd)]
pub struct Foo(u8, u8, u8);
extern "platform-intrinsic" {
fn simd_add<T>(a: T, b: T) -> T;
}
pub fn add_foo(a: Foo, b:Foo) -> Foo {
unsafe { simd_add(a,b) }
}
playground::add_foo: # @playground::add_foo
# %bb.0:
movq %rdi, %rax
movd (%rsi), %xmm0 # xmm0 = mem[0],zero,zero,zero
movd (%rdx), %xmm1 # xmm1 = mem[0],zero,zero,zero
paddb %xmm0, %xmm1
movdqa %xmm1, -24(%rsp)
movb -22(%rsp), %cl
movb %cl, 2(%rdi)
movd %xmm1, %ecx
movw %cx, (%rdi)
retq
# -- End function
If Cranelift won't support them, cg_clif will need to load and store for each simd operation even with maximal inlining as cg_clif only supports keeping types representable using one or two cranelift values in registers. The rest is forced to the stack.
Yeah, honestly I was trying to read LLVM's generated SIMD assembly and I went pretty cross-eyed a few times trying to follow all the extra work being done, so while I filed this request I also do very much think it's important that whatever is done that it not unduly pessimize the "ordinary" NEON/SSE cases that use e.g. f32x4s, and I appreciate the engineering challenge that this makes for.
The silicon that supports these more or less directly: GPUs handle Vec3s (
f32x3
typically) all the time already. Arm SVE supports 384-bit width vector registers and is available Soon™. RISCVV will eventually exist and support arbitrary-width vectors, somewhere, over the rainbow:rainbow: someday:musical_note:...LLVM's approach for handling these when only fixed width vector registers are available to compile to was, as far as I could tell, and as described by the author of the vek crate, an approach similar to the one GPUs use: use 128-bit registers just fine but politely ignore the unspecified lanes when the "Vec3" types are loaded and stored.
Also the https://github.com/WebAssembly/flexible-vectors/ proposal exists, though is currently in a fairly nascent state. Still, another point to this being a long-term desirable even if it's not immediately needed.