rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.61k stars 12.48k forks source link

Tracking Issue for Missing BMI1, AVX2, SSE2, SSE4.1, SSE4a and TBM intrinsics #126936

Closed sayantn closed 1 month ago

sayantn commented 2 months ago

The feature gate is #[feature(simd_x86_updates)].

The Public API is 13 new intrinsics (probably overlooked in the simd_x86 feature). See rust-lang/stdarch#1178.

Steps

Implementation History

We cannot add _mm_malloc and _mm_free as they need access to OS, but core_arch is a no_std environment.

Amanieu commented 2 months ago

These intrinsics were supposed to be part of a already-stabilized set, but were previously overlooked.

@rfcbot fcp merge

rfcbot commented 2 months ago

Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot commented 2 months ago

:bell: This is now entering its final comment period, as per the review above. :bell:

RalfJung commented 1 month ago

Oh no, more non-temporal operations... quoting from a recent x86 memory model paper:

In addition to several non-temporal store instructions, Intel-x86 architectures provide a single non-temporal load instruction. However, as our private correspondence with the lead architect of the Intel instruction set system architecture has revealed, the non-temporal load instruction has been a source of implementation issues, it has not been implemented consistently, and there has been ambiguity regarding its semantics.

Are we sure we can just pretend that those are regular loads, for the purpose of language semantics?

Amanieu commented 1 month ago

Non-temporal loads are not allowed to violate normal memory ordering rules, at least when accessing normal (i.e. write-back cachable) memory. x86 of course allows some regions of memory to be marked as write-combining, at which point the normal memory ordering rules go out the window, but this only happens for memory-mapped I/O, not normal memory. The problem with non-temporal stores on x86 is that they violate normal memory ordering rules even when used on normal (write-back) memory.

See this answer on SO for more details.

RalfJung commented 1 month ago

Thanks; I will get in touch with the authors of the paper to clarify whether the architect they spoke with was referring to non-temporal loads behaving in odd ways only for "non-standard" memory regions or also for write-back memory.

Meanwhile, would be worth warning about people using these intrinsics on non-write-back memory? Though that warning is probably better placed at whatever operation creates such memory. It's not really well-defined to access such memory with Rust operations (i.e., outside of inline assembly) anyway...

rfcbot commented 1 month ago

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.