Tracking Issue for RFC 2948: Portable SIMD

calebzulawski commented 3 years ago

Feature gate: #![feature(portable_simd)]

This is a tracking issue for the future feature chartered in RFC 2977, with the intent of creating something akin to the design in RFC 2948 (rust-lang/rfcs#2948): a portable SIMD library (std::simd).

Portable SIMD project group: https://github.com/rust-lang/project-portable-simd Implementation: https://github.com/rust-lang/portable-simd

More discussion can be found in the #project-portable-simd zulip stream.

Steps

[X] Implement the experimental feature behind a feature gate:
- https://github.com/rust-lang/rust/pull/89167
[ ] Define the semantics explicitly.
[ ] Write up an RFC detailing the design and defined semantics.
[ ] Adjust documentation (see instructions on rustc-dev-guide)
[ ] Stabilization PR (see instructions on rustc-dev-guide)

Unresolved Questions

[ ] What will the overall design be?
[ ] What are the ideal semantics for Masks?
[ ] Are there any limits or vector sizes we should not support?
[ ] How should these types interop with types like Saturating, NonZero, etc.?

Implementation History

HannesGitH commented 1 year ago

Feature gate: #![feature(portable_simd)]

i'm sorry if this is the wrong place to ask but im rather new to rust and stumbled upon this issues as my compiler told me to

if i want to use this feature as soon as my compiler supports it can i gate it like:

#[cfg(feature = "portable_simd")]
use std::simd::Simd;

or is that only for feautures regarding my package (set in toml or passed to cargo?) if so what would be the appropriate way to use simd as soon as this issue is resolved?

Lokathor commented 1 year ago

The #![feature(portable_simd)] part goes at the top of a binary or library.

It's a language feature not a cargo feature so it works a little differently.

It's unfortunate that they're both just "feature". Rust is often too terse when it counts.

HannesGitH commented 1 year ago

ok thanks a lot!

just to make sure this means there is no (easy*) way to use this language feature if my compiler supports it and fall back to a custom implementation otherwise?

_{*easy as in compile time guards / attribute-like macros or creating a custom wrapper module that either provides rusts simd or my own fallback or something else in that level of skill}

for anyone else stumbling upon this:

language features are (unstable) features you can opt-in when using nightly rust (by putting the specified flag in your library root, the whole project will then be compiled with a compiler that uses this feature)

agausmann commented 1 year ago

@HannesGitH You might be able to do this with the rustversion macro, which works similar to #[cfg]. However, I haven't tested any of these myself.

For example:

To enable the experimental feature flag on nightly,

#![rustversion::attr(nightly, feature(portable_simd))]

To conditionally import something on nightly only,

#[rustversion::nightly]
use std::simd::Simd;

I would recommend encapsulating all of your nightly-specific code in a module, so you only have to conditionally define the module itself and anything using it:

#[rustversion::nightly]
mod portable_simd;

HannesGitH commented 1 year ago

pretty close to what i intented, thanks for letting me know, thats what i'll use 👍

only problem is that once this feature becomes stable and one would compile my code using the stable version (which would then include actual simd) the resulting binary would still use my simd workarounds

might be nice to have some kind of #[feature_available(portable_simd)] macro for peeps that rather like to stay with stable rust but want to use some compiler feature as soon as they reach stable

agausmann commented 1 year ago

Once it is stabilized, you could update the macro from rustversion::nightly to rustversion::since(1.xx) to use this feature on all Rust versions since its stabilization, or just remove the workarounds if you don't need to support earlier Rust compiler versions.

workingjubilee commented 1 year ago

A lot of mischief can also be done with #[cfg_attr].

erwanvivien commented 1 year ago

Is there any work done doday to use simd ? I see we are 1/9 and there is not much activity at portable_simd.

Does this part of Rust requires help ?

calebzulawski commented 1 year ago

That first task was most of the work--with the nightly compiler you can use std::simd today. Most of the improvements being worked on now are relatively minor in comparison, but we are always open to contributions.

At this point we are mostly focusing on usability rather than features, which really means two things: ease of use of the API, and quality of the code generation. Once a base set of features is in a good state, we can begin the RFC process.

erwanvivien commented 1 year ago

Thanks for info and the kind reply!

safinaskar commented 1 year ago

@agausmann

To enable the experimental feature flag on nightly,
#![rustversion::attr(nightly, feature(portable_simd))]

Unfortunately, this particular code doesn't work

Inspirateur commented 1 year ago

Do you think this has a chance to get stabilized ? It seems like activity has been very low recently, despite being a cool feature

programmerjake commented 1 year ago

Do you think this has a chance to get stabilized ? It seems like activity has been very low recently, despite being a cool feature

I think it will be stabilized but not right away, afaict that still needs a RFC with the full detailed design.

scottmcm commented 1 year ago

@Inspirateur I think there's two big things needed:

Confidence that the overall shape of things is good enough to stabilize -- questions like whether the lanecount trait is worth having, whether the structure will work tolerably with vscale, etc
Someone carving out a subset in which people are confident and writing an RFC for it. Maybe that's the types, the basic lanewise ops, and conversions/layout stuff to start with. (Notably, if for some time non-trivial things require converting to platform-specific types like __m256 and using intrinsics, that's fine. We could do, say, aggregations and masks in a v2, shuffles and swizzles in a v3, or something.)

CarlKCarlK commented 1 year ago

@agausmann
To enable the experimental feature flag on nightly,
#![rustversion::attr(nightly, feature(portable_simd))]
@safinaskar Unfortunately, this particular code doesn't work

This worked for me:

#![cfg_attr(feature = "from_slice", feature(portable_simd))]

where "from_slice" is the name of my the-other-kind-of-feature, defined in Cargo.toml, that uses portable_simd.

[features]
from_slice = []

So, I run tests, for example, via cargo test --features=from_slice.

GlenDC commented 11 months ago

Is this on the 2024 edition roadmap, or will it be only for after that? I know it’s not related, but gives me a timeline range.

calebzulawski commented 11 months ago

I don't think anyone has a specific timeline, but we still need to draft a new RFC and go through the approval process, which can take some time.

jhpratt commented 8 months ago

Is there a particular reason that Simd does not implement Deref and DerefMut? I don't see any reason the impls would restrict the ability to do anything.

Lokathor commented 8 months ago

Like deref into a slice? Usually that's not done because it's a huge performance footgun.

Firstyear commented 8 months ago

It may be good to document what that footgun is and why the choice was made because people will ask this again in future.

Lokathor commented 8 months ago

So, to add more detail: the problem is that (depending on SIMD used) you can't in general index to a particular lane of a SIMD register. So if you view the SIMD data as a slice and operate on an element of the slice, what the hardware must do is have the CPU stop the current SIMD processing, write the register to the stack, work on the stack value (however the slice is adjusted), and then load that back into a SIMD register. This is, in general, a performance disaster. As usual, the optimizer might be able to cut out this stall in the pipeline, in some cases, depending on circumstances, etc etc. But you should expect that the SIMD handling is totally stalled when trying to treat the data as a slice.

jhpratt commented 8 months ago

I figured there was a reason, but I'm not familiar with how SIMD works under the hood. Given that indexing is the problem, why implement Index and IndexMut then?

Lokathor commented 8 months ago

Oh, uh, well I haven't looked in a while! I guess I'm out of the loop on the current API details.

I'm surprised that Index is in if Deref is out. Either both should be in or both should be out, would be my expectation.

calebzulawski commented 8 months ago

The basic idea is that we want a clear marker of the boundary between SIMD and non-SIMD operations. When using Index (vector[i]) there is an obvious sign that you are no longer using SIMD operations. Likewise with arrays and slices, we implement AsRef and the to_array function because these are explicit. The concern with Deref is that the automatic inclusion of all slice functions makes it harder to tell which operations are SIMD. For example, you may expect is_ascii to be vectorized, but instead it is simply a scalar implementation inherited from slices.

Lokathor commented 8 months ago

vector[i] isn't particularly more obvious, I would say.

Maybe we should just always make people convert to an array to index elements?

calebzulawski commented 8 months ago

A while ago we didn't implement Index and we got requests for it, but this is the first time Deref has come up, so I think it's a good compromise. Maybe it's not particularly obvious that Index is the boundary, but Deref is completely invisible without consulting the docs.

ZagButNoZig commented 7 months ago

There are certain types of instructions where the output data type is different from the input data type like: _mm256_maddubs_epi16. I don't think there is a way to do that in portable simd without casting first which is slower? Are there any plans to support these instructions. Similar instructions also exist on arch: vdotq_s32

abysssol commented 7 months ago

Hi, I was wondering if there had been any discussion or consideration of making a dynamically sized api for vector operations. The current api seems to be analogous to arrays, but perhaps a more elegant and convenient solution would be analogous to slices.

I learned about this idea when researching risc-v's vector extension. Both this article and this one (fully rendered here) are good references on the motivation, from the perspective of an ISA.

While the current api is already much better than traditional simd instructions, it seems to me that the logical conclusion is a runtime sized type; maybe a wrapper around &mut [T], or a type like Vec<T>, or perhaps a modification to Vec<T> that guarantees simd optimization if T is a numeric primitive.

Hopefully this can spark a useful discussion on the best design of simd/vector types and operations. Thank you for your consideration.

Lokathor commented 7 months ago

That could be some additional API that lives along aside the fixed sized SIMD types, but for the main CPU arches a fixed sized simd type is what generally works best with optimizations.

colejohnson66 commented 4 months ago

Curious how ARM SVE and RISC-V V are meant to be used in Rust. The fixed-length abstraction is a nice one, and it's what .NET is going with in .NET 9 (Vector<T> for SVE is 128-bit, at least for now), but variable-length vectors are here to stay.

dead-claudia commented 4 months ago

Curious how ARM SVE and RISC-V V are meant to be used in Rust. The fixed-length abstraction is a nice one, and it's what .NET is going with in .NET 9 (Vector<T> for SVE is 128-bit, at least for now), but variable-length vectors are here to stay.

RISC-V offers extensions like Zvl128b that provide hard guarantees on minimum vector size. It should be possible to leverage this in the interim while RISC-V figures out their P extension (which isn't very far along).

Edit: fix extension name

Salabar commented 3 months ago

Would it it make sense to add a family of functions like "loadbase*" that take a slice and an isize index? It would account for buffer underflow as well as overflow. With this you can write things such as convolution with nice clean loops that don't have account for edge cases.

for i in 0..image.len(){
   let mut result = 0.;

  for j in 1..kernel.radius() / N {
    let left = Simd::<N>::load_base_or(image, i - j * N, splat(image[0]);
   //... 
 }
  for j in 0..kernel.radius() / N {
    let right= Simd::<N>::load_base_or(image, i + j * N, splat(image.last());
   //... 
}
  image[i] = result;
}

DXist commented 3 months ago

Is it possible to move Mask inherent methods into a trait like SimdMask and add this trait as a bound to associated type Mask of other traits, e.g. SimdPartialEq?

This will help to write generic code that works for different primitive types.

Got this idea while writing a fixed index map data structure that is expected to work with unsigned integer keys regardless of the width.

Without the trait bound for Mask associated type I have to wrap my implementation into macros and explicitly apply it to u8, u16, u32, u64 and usize.

calebzulawski commented 3 months ago

I think you should probably be able to do what you want:

fn generic<T>(v: Simd<T, 4>, m: Mask<T::Mask, 4>) -> bool
where
    T: SimdElement + Default,
    Simd<T, 4>: SimdPartialEq<Mask = Mask<T::Mask, 4>>,
{
    (v.simd_eq(Simd::splat(Default::default())) ^ m).all()
}

However, it would be nice if there were an easier way to do this without requiring that extra bound.

DXist commented 3 months ago

@calebzulawski , thank you!

It worked along with a couple of bounds from num-traits crate.

Maybe an example with generic code will be a useful demo of bounds usage.

rust-lang / rust