Closed Nilstrieb closed 6 months ago
I'm not sure where tests for this are supposed to go. stdarch
tests?
The duplicated code for these packs does make me worry a bit. After going through the intrinsics guide, I also found some packs that weren't implemented yet. I think I'm going to restructure the code here so that the packs are neatly packed together, with all of _mm{,256}_pack{u,us}_epi{16,32}
implemented.
I'm not sure where tests for this are supposed to go.
I've been copying stdarch tests into example/std_example.rs several times.
There is a lot of copy pasting here, so I'm not surprised it's buggy ^^'.
Yeah, this code is horrible. I hope to some day generate it directly from the instruction manual or something like that. Or create a DSL that allows writing this kind of stuff with less code duplication (and maybe also allows it to be reused by miri and other tools).
Thanks for the fix! Please ignore the test failure. That is https://github.com/rust-random/rand/issues/1355.
What's currently implemented vs what exists: | sse 16 | avx 16 | sse 32 | avx 32 | |
---|---|---|---|---|---|
unsigned | _mm_packus_epi16|llvm.x86.sse2.packuswb.128 ✅ | _mm256_packus_epi16|llvm.x86.avx2.packuswb ✅ | _mm_packus_epi32|llvm.x86.sse41.packusdw ✅ | _mm256_packus_epi32|llvm.x86.avx2.packusdw | |
signed | _mm_packs_epi16|llvm.x86.sse2.packsswb.128 | _mm256_packs_epi16|llvm.x86.avx2.packsswb | _mm_packs_epi32|llvm.x86.sse2.packssdw.128 ✅ | _mm256_packs_epi32|llvm.x86.avx2.packssdw ✅ | |
I'll clean it up a bit and implement all of those based on that, should be fairly little code. llvm.x86.sse41.packusdw
is also pretty suspicious as it currently uses smin
, while the other unsigned ones use umin
.
llvm.x86.sse41.packusdw is also pretty suspicious as it currently uses smin, while the other unsigned ones use umin.
Smin is correct here afaict. The input is a signed 32bit integer and we need to check that it fits in an unsigned 16bit integer. Using umin would cause the input to be interpreted as unsigned 32bit integer. Although because of the smax before it, I think it does actually not matter at all if umin or smin is used.
In any case having a helper function for doing the saturating equivalent of ireduce as is done here would be nice to have. It can probably go in num.rs or cast.rs.
I created #1443 to restructure all the packed code.
closing in favor of #1443
fast_image_resize
yielded broken images, a little bit of println bisecting revealed the SIMD instruction that was at fault. A bit of staring at the cg_clif impl and the Intel manual then revealed the place of the bug. There is a lot of copy pasting here, so I'm not surprised it's buggy ^^'.