rust-lang / stdarch

Rust's standard library vendor-specific APIs and run-time feature detection
https://doc.rust-lang.org/stable/core/arch/
Apache License 2.0
601 stars 267 forks source link

Refactor avx512f #1597

Closed TDecking closed 3 months ago

TDecking commented 3 months ago
rustbot commented 3 months ago

r? @Amanieu

rustbot has assigned @Amanieu. They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

TDecking commented 3 months ago

@sayantn Do you mind if I open this? There is a bit of an overlap between this PR and yours, but some things in this PR is not present in your work.

sayantn commented 3 months ago

So the only overlap is fma intrinsics and masked loads? I have no problem with you implementing the fma (honestly I didn't know about the simd_fma intrinsic. But I think right now you should not do the masked load/stores. simd_masked_load aligns with only the element's alignment, so it will never generate the aligned load instructions. See rust-lang/rust#126919. Also, typically in stdarch it is preferred to link with llvm and use the simd intrinsics using the core::simd types instead of the __m128i etc. I will remove the fma enhancements from my PR, it will remain draft for some time.

TDecking commented 3 months ago

@sayantn I've removed the masked load changes on my end. Our PRs should now be orthogonal to each other.

sayantn commented 3 months ago

Yes I will modify my PR in a while. I will also implement the missing reduce-max etc intrinsics and fix the _mm_cvtt intrinsics (they currently generate vcvt instructions, not cvtt)

sayantn commented 3 months ago

can you also please do the floating-point abs using simd_fabs

TDecking commented 3 months ago

@sayantn done.

sayantn commented 3 months ago

Thanks.

sayantn commented 3 months ago

I have already done the remaining gather-scatter in avx512f. Can you complete avx512bw - the reduce intrinsics and some mask operations? Then I will start on the remaining IFMA and BF16, then start implementing the new VEX variants