Closed TDecking closed 3 months ago
r? @Amanieu
rustbot has assigned @Amanieu. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.
Use r?
to explicitly pick a reviewer
@sayantn Do you mind if I open this? There is a bit of an overlap between this PR and yours, but some things in this PR is not present in your work.
So the only overlap is fma intrinsics and masked loads? I have no problem with you implementing the fma (honestly I didn't know about the simd_fma
intrinsic. But I think right now you should not do the masked load/stores. simd_masked_load
aligns with only the element's alignment, so it will never generate the aligned load instructions. See rust-lang/rust#126919. Also, typically in stdarch it is preferred to link with llvm and use the simd intrinsics using the core::simd
types instead of the __m128i
etc. I will remove the fma enhancements from my PR, it will remain draft for some time.
@sayantn I've removed the masked load changes on my end. Our PRs should now be orthogonal to each other.
Yes I will modify my PR in a while. I will also implement the missing reduce-max etc intrinsics and fix the _mm_cvtt intrinsics (they currently generate vcvt instructions, not cvtt)
can you also please do the floating-point abs using simd_fabs
@sayantn done.
Thanks.
I have already done the remaining gather-scatter in avx512f. Can you complete avx512bw - the reduce intrinsics and some mask operations? Then I will start on the remaining IFMA and BF16, then start implementing the new VEX variants
_MM_FROUND_CUR_DIRECTION
.