Open samuelcolvin opened 1 year ago
This is certainly a specialty function, only really supported by x86-64. As far as I can tell, from reading the documentation, you can mimic the function with something like:
pub fn maddubs(a: u8x16, b: i8x16) -> i16x8 {
let a: i16x16 = a.cast();
let b: i16x16 = b.cast();
let m: i16x16 = a * b;
simd_swizzle!(m, [0, 2, 4, 6, 8, 10, 12, 14])
.saturating_add(simd_swizzle!(m, [1, 3, 5, 7, 9, 11, 13, 15]))
}
Unfortunately this does not produce great codegen, because LLVM doesn't seem to recognize it as pmaddubsw
. It's possible some other formulation would result in better codegen. Unless there's a matching instruction on other architectures, I doubt this will ever be supported by std::simd
as it's not particularly portable, but it could be possible to improve LLVM to recognize this pattern as a single instruction.
Thanks, I do have an implementation of the same logic for aarch64, I'll try and look for it on my laptop, but to be honest it's more like "do the same calculation with different architecture" than exactly equivilant methods.
Hi, I'm looking for a way to implement these instructions with portable simd, but can't find any pointers.
Is this possible, if so how? Otherwise is there any willingness to add support?
For more context on what I'm trying to do, see here - basically int parsing by progressively collapsing SIMD arrays.