Refactor avx512f - Githubissues

rust-lang / stdarch

Rust's standard library vendor-specific APIs and run-time feature detection

https://doc.rust-lang.org/stable/core/arch/

Apache License 2.0

601 stars 267 forks source link

Refactor avx512f #1597

Closed TDecking closed 3 months ago

TDecking commented 3 months ago

Fused multiply-add functions have been reworked and can now be used by miri.
Square root functions have been reworked and can now be used by miri.
The definitions of some functions with explicit rounding have been simplified.
Some functions now correctly use _MM_FROUND_CUR_DIRECTION.
Some integer functions have been reworked and can now be used by miri.
Some missing intrinsics were added.
Masked integer comparisons now properly use the mask registers.
Some documentation issues were fixed.

rustbot commented 3 months ago

r? @Amanieu

rustbot has assigned @Amanieu. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

TDecking commented 3 months ago

@sayantn Do you mind if I open this? There is a bit of an overlap between this PR and yours, but some things in this PR is not present in your work.

sayantn commented 3 months ago

So the only overlap is fma intrinsics and masked loads? I have no problem with you implementing the fma (honestly I didn't know about the simd_fma intrinsic. But I think right now you should not do the masked load/stores. simd_masked_load aligns with only the element's alignment, so it will never generate the aligned load instructions. See rust-lang/rust#126919. Also, typically in stdarch it is preferred to link with llvm and use the simd intrinsics using the core::simd types instead of the __m128i etc. I will remove the fma enhancements from my PR, it will remain draft for some time.

TDecking commented 3 months ago

@sayantn I've removed the masked load changes on my end. Our PRs should now be orthogonal to each other.

sayantn commented 3 months ago

Yes I will modify my PR in a while. I will also implement the missing reduce-max etc intrinsics and fix the _mm_cvtt intrinsics (they currently generate vcvt instructions, not cvtt)

sayantn commented 3 months ago

can you also please do the floating-point abs using simd_fabs

TDecking commented 3 months ago

@sayantn done.

sayantn commented 3 months ago

Thanks.

sayantn commented 3 months ago

I have already done the remaining gather-scatter in avx512f. Can you complete avx512bw - the reduce intrinsics and some mask operations? Then I will start on the remaining IFMA and BF16, then start implementing the new VEX variants