hsivonen commented 4 years ago

WARNING

The Major Change Process was proposed in RFC 2936 and is not yet in full operation. This template is meant to show how it could work.

Proposal

Create a project group for considering what portable SIMD in the standard library should look like.

Motivation, use-cases, and solution sketches

While Rust presently exposes ALU features of the underlying ISAs in a portable way, it doesn't expose SIMD capabilities in a portable way except for autovectorization.

A wide variety of computation tasks can be accomplished faster using SIMD than using the ALU capabilities. Relying on autovectorization to go from ALU-oriented source code to SIMD-using object code is not a proper programming model. It is brittle and depends on the programmer being able to guess correctly what the compiler back end will do. Requiring godbolting for every step is not good for programmer productivity.

Using ISA-specific instructions results in ISA-specific code. For things like "perform lane-wise addition these two vectors of 16 u8 lanes" should be a portable operation for the same reason as "add these two u8 scalars" is a portable operation that does not require the programmer to write ISA-specific code.

Typical use cases for SIMD involve text encoding conversion and graphics operations on bitmaps. Firefox already relies of the Rust packed_simd crate for text encoding conversion.

Compiler back ends in general and LLVM in particular provide a notion of portable SIMD where the types are lane-aware and of particular size and the operations are ISA-independent and lower to ISA-specific instructions later. To avoid a massive task of replicating the capabilities of LLVM's optimizer and back ends, it makes sense to leverage this existing capability.

However, to avoid exposing the potentially subject-to-change LLVM intrinsics, it makes sense expose an API that is conceptually close and maps rather directly to the LLVM concepts while making sense for Rust and being stable for Rust applications. This means introducing lane-aware types of typical vector sizes, such as u8x16, i16x8, f32x4, etc., and providing lane-wise operations that are broadly supported by various ISAs on these types. This means basic lane-wise arithmetic and comparisons.

Additionally, it is essential to provide shuffles where what lane goes where is known at compile time. Also, unlike the LLVM layer, it makes sense to provide distinct boolean/mask vector types for the outputs of lanewise comparisons, because encoding the invariant that all bits of a lane are either one or zero allows operations like "are all lanes true" or "is at least one lane true" to be implemented more efficiently especially on x86/x86_64.

When the target doesn't support SIMD, LLVM provides ALU-based emulation, which might not be a performance win compared to manual ALU code, but at least keeps the code portable.

When the target does support SIMD, the portable types must be zero-cost transmutable to the types that vendor intrinsics accept, so that specific things can be optimized with ISA-specific alternative code paths.

The packed_simd crate provides an implementation that already works across a wide variety of Rust targets and that has already been developed with the intent that it could become std::simd. It makes sense not to start from scratch but to start from there.

The code needs to go in the standard library if it is assumed that rustc won't, on stable Rust, expose the kind of compiler internals that packed_simd depends on.

Please see the FAQ.

Links and related work

The Major Change Process

Once this MCP is filed, a Zulip topic will be opened for discussion. Ultimately, one of the following things can happen:

If this is a small change, and the team is in favor, it may be approved to be implemented directly, without the need for an RFC.
If this is a larger change, then someone from the team may opt to work with you and form a project group to work on an RFC (and ultimately see the work through to implementation).
Alternatively, it may be that the issue gets closed without being accepted. This could happen because:
- There is no bandwidth available to take on this project right now.
- The project is not a good fit for the current priorities.
- The motivation doesn't seem strong enough to justify the change.

You can read [more about the lang-team MCP process on forge].

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

rustbot commented 4 years ago

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

nikomatsakis commented 4 years ago

Hey @rust-lang/lang -- we haven't seen anyone volunteering to serve as liaison for this project. If we're not able to find a liaison in the next week or so, I'm going to close the issue. If you think you might like to serve as a liaison, even if you don't have bandwidth right now, please do speak up -- we can always put the proposal on the "deferred list" to pick up when we have time.

nikomatsakis commented 4 years ago

We discussed this in our @rust-lang/lang meeting today. We went back and forth about where it fit into our priorities and so forth, but one of the final conclusions we came to was that it seemed like this was very much a "library design" question more than anything. The "language design" portion of it is basically limited to "should we add intrinsic capabilities and therefore tie ourselves to LLVM even further", correct?

We'd be curious therefore to hear from some @rust-lang/libs folk as to whether there is appetite to pursue this design.

One of the other questions that we wanted input on was whether there are other crates who have pursued exposing portable SIMD in the library space beyond packedsimd.

Some other notes from the minutes:

How strong is the motivation / problem? What is evidence for that?
- Medium
What priority does this correspond to and how?
- “C Parity, interop, and embedded” but only in the sense of “exposing some capability that’s really imp’t if you want it, but less so if you don’t” (not wildly general purpose)
What are the challenges and how severe are they? e.g.,
- Controversial, ties us to LLVM
- Lots of details to consider
Who might serve as liaison?
- ??? unclear, how much does it even NEED a lang liaison
Who are the key stakeholders to include?
- Massive libs interaction, we would want a liaison from libs
- Other stakeholders from different schools of thought on how to deal with SIMD
Other
- Having fixed-width SIMD at various widths is valuable and doing full-on variable width might be worth handling separately
- Proposal here is to have 128bit, 256, etc, and implement it everywhere we can even if that requires some polyfills
  - Josh feels this is the right overall approach
- Lots of details to get right:
  - What are the right vector types to have and how can you move between them?
- What’s needed from language?
  - Could be done as a library using the intrinsics that are already exposed
  - But the other approach is to use LLVM intrinsics that are in some way portable-ish and are designed to compile to the right instructions
- Not a lot of lang-team requirements here, this is more of a libs question, so lang-team might have more of a “review” capability here
Niko to post summary

scottmcm commented 4 years ago

The "language design" portion of it is basically limited to "should we add intrinsic capabilities and therefore tie ourselves to LLVM even further", correct?

I don't think that, from a formal specification perspective, this is true. Another rust implementation could provide a fully semantically-correct implementation by just calling the scalar versions of all the functions in the appropriate loops.

Now, obviously from a quality-of-implementation perspective a compiler would likely want to provide something smarter than that, to take better advantage of hardware capabilities. But I think LLVM is only one way of getting that -- albeit what I would probably pick if I was implementing atop of it anyway. We could also have it with an implementation strategy of cfg_if!s to call existing-stable intrinsics on the relevant platforms, for example, as people hit them or they stabilize.

My expectation from "portable" is that such differences would necessarily be inobservable semantically, and thus I think I personally would be fine with this being entirely a libs projects: to figure out the portable set of operations, how best to expose a rust interface to those, and how best to have them interop with non-portable platform intrinsics where needed, etc. (There might be some libs-impl/compiler/lang conversations about implementations details, but I suspect none of those will lock us into things.)

Lokathor commented 4 years ago

Just stop deliberately blocking regular users from linking to LLVM intrinsics (if that's what they want to do), and then none of this needs to be in the standard library at all.

Any crate could just make any design, and if it turns out bad then people can migrate to some other design.

And if a specific design is some day determined to be good enough, it can become part of the standard library then.

EDIT: There are a few other blockers to people doing their own out-of-stdlib experimentation, besides just linking to llvm intrinsics, but not too many.

hsivonen commented 4 years ago

“C Parity, interop, and embedded”

I think "C parity" mischaracterizes this: core::arch is C parity. This is about being better.

Controversial, ties us to LLVM

I think exposing the intrinsics that would allow packed_simd to be written in the crate ecosystem would tie us to LLVM. The crucial point of putting this in std::simd is not getting tied to LLVM. My understanding is that conceptually GCC already has similar capabilities and to the extent Cranelift will eventually support WASM SIMD, Cranelift must develop similar capability at least for 128-bit vectors.

hsivonen commented 4 years ago

Also, I don't see this as a particularly "embedded" thing: E.g. my use cases for this involve desktop and mobile. (Apple Silicon is going to make this even more relevant on desktop than it already is.)

nikomatsakis commented 4 years ago

Discussed in the rust-lang/lang meeting:

We still feel like this is more of a libs concern than language, at least as presented.
We would though want to have a lang liaison assigned to help things move along promptly in case a lang issue arises during discussion.
This is a fairly major effort and it's not clear if the bandwidth is available.

KodrAus commented 4 years ago

Right now we haven't fleshed out our project group process for libs (I'm writing up an RFC for catching up with the governance of other teams right now). In the meantime @hsivonen if you don't want to be blocked getting the ball rolling establishing a group, which I'd be on-board with, we could follow the same RFC process the Error handling group has.

nikomatsakis commented 4 years ago

OK, based on that I'm going to close this MCP and encourage @hsivonen to follow-up with @KodrAus.

KodrAus commented 4 years ago

Ah I should’ve followed up here, but we turned this proposal into an RFC for libs here: https://github.com/rust-lang/rfcs/pull/2977

rust-lang / lang-team

Portable SIMD project group #29

WARNING

Proposal

Motivation, use-cases, and solution sketches

Links and related work

The Major Change Process

Comments