rust-lang / portable-simd

The testing ground for the future of portable SIMD in Rust
Apache License 2.0
903 stars 81 forks source link

Support for `_mm_maddubs_epi16` and `_mm_maddubs_epi16` and similar? #366

Open samuelcolvin opened 1 year ago

samuelcolvin commented 1 year ago

Hi, I'm looking for a way to implement these instructions with portable simd, but can't find any pointers.

Is this possible, if so how? Otherwise is there any willingness to add support?

For more context on what I'm trying to do, see here - basically int parsing by progressively collapsing SIMD arrays.

calebzulawski commented 1 year ago

This is certainly a specialty function, only really supported by x86-64. As far as I can tell, from reading the documentation, you can mimic the function with something like:

pub fn maddubs(a: u8x16, b: i8x16) -> i16x8 {
    let a: i16x16 = a.cast();
    let b: i16x16 = b.cast();
    let m: i16x16 = a * b;
    simd_swizzle!(m, [0, 2, 4, 6, 8, 10, 12, 14])
        .saturating_add(simd_swizzle!(m, [1, 3, 5, 7, 9, 11, 13, 15]))
}

Unfortunately this does not produce great codegen, because LLVM doesn't seem to recognize it as pmaddubsw. It's possible some other formulation would result in better codegen. Unless there's a matching instruction on other architectures, I doubt this will ever be supported by std::simd as it's not particularly portable, but it could be possible to improve LLVM to recognize this pattern as a single instruction.

samuelcolvin commented 1 year ago

Thanks, I do have an implementation of the same logic for aarch64, I'll try and look for it on my laptop, but to be honest it's more like "do the same calculation with different architecture" than exactly equivilant methods.