Support for `_mm_maddubs_epi16` and `_mm_maddubs_epi16` and similar?

samuelcolvin commented 1 year ago

Hi, I'm looking for a way to implement these instructions with portable simd, but can't find any pointers.

Is this possible, if so how? Otherwise is there any willingness to add support?

For more context on what I'm trying to do, see here - basically int parsing by progressively collapsing SIMD arrays.

calebzulawski commented 1 year ago

This is certainly a specialty function, only really supported by x86-64. As far as I can tell, from reading the documentation, you can mimic the function with something like:

pub fn maddubs(a: u8x16, b: i8x16) -> i16x8 {
    let a: i16x16 = a.cast();
    let b: i16x16 = b.cast();
    let m: i16x16 = a * b;
    simd_swizzle!(m, [0, 2, 4, 6, 8, 10, 12, 14])
        .saturating_add(simd_swizzle!(m, [1, 3, 5, 7, 9, 11, 13, 15]))
}

Unfortunately this does not produce great codegen, because LLVM doesn't seem to recognize it as pmaddubsw. It's possible some other formulation would result in better codegen. Unless there's a matching instruction on other architectures, I doubt this will ever be supported by std::simd as it's not particularly portable, but it could be possible to improve LLVM to recognize this pattern as a single instruction.

samuelcolvin commented 1 year ago

Thanks, I do have an implementation of the same logic for aarch64, I'll try and look for it on my laptop, but to be honest it's more like "do the same calculation with different architecture" than exactly equivilant methods.

rust-lang / portable-simd

Support for `_mm_maddubs_epi16` and `_mm_maddubs_epi16` and similar? #366