rust-lang / packed_simd

Portable Packed SIMD Vectors for Rust standard library
https://rust-lang.github.io/packed_simd/packed_simd_2/
Apache License 2.0
590 stars 74 forks source link

Working with generics over the SIMD type. #232

Closed davenza closed 5 years ago

davenza commented 5 years ago

Hi,

this can be quite hard, so I don't know if it can be implemented in Rust right now.

I have been trying to implement a function over a generic type. I have a baseline implementation for any type. However, I want to use SIMD if the generic type implements SIMD. Of course, for this kind of optimization we must use specialization. My first try was something like this:

#![feature(specialization)]
use std::ops::{Add, AddAssign};

use packed_simd::u32x8;

// Function over generic parameter.
pub trait Operation<V> {
    fn f(a: V, b: &mut [V])
        where V: Add<Output=V> + AddAssign<V> + Clone;
}

// Dumb struct
pub struct A<V>(V);

// Baseline implementation
impl<V> Operation<V> for A<V> {
    default fn f(a: V, b: &mut [V])
        where V: Add<Output=V> + AddAssign<V> + Clone
    {
        for e in b.iter_mut() {
            *e += a.clone();
        }
    }
}

// Represents a SIMDeable type
pub trait SIMDeable {
    type SIMDType;
}

impl SIMDeable for u32 {
    // I have AVX2
    type SIMDType = u32x8;
}

// Optimized implementation
impl<V> Operation<V> for A<V>
    where V: SIMDeable
{
    fn f(a: V, b: &mut [V])
        where V: Add<Output=V> + AddAssign<V> + Clone
    {
//        Can't call any Simd<> functions
//        Self::SIMDType.splat()
    }
}

The problem here is that I can't create a u32x8 because I don't have the specific type. My second thought was to force the associated type to be a Simd:

pub trait SIMDeable<T> {
    type SIMDType = Simd<T>;
}

impl SIMDeable<[u32; 8]> for u32 {}

// Optimized implementation
impl<V, T> Operation<V> for A<V>
    where V: SIMDeable<T>
{
    fn f(a: V, b: &mut [V])
        where V: Add<Output=V> + AddAssign<V> + Clone
    {
        // Here, I should convert "a" to the specific type, but for clarity:
        Self::SIMDType.splat(1);
    }
}

This code does not compile:

error[E0207]: the type parameter `T` is not constrained by the impl trait, self type, or predicates
  --> src/lib.rs:43:9
   |
43 | impl<V, T> Operation<V> for A<V>
   |         ^ unconstrained type parameter

error: aborting due to previous error

For more information about this error, try `rustc --explain E0207`.

It is interesting that the parameter T is indeed constrained in packed_simd. The definition of Simd in the documentation is:

pub struct Simd<A: SimdArray>(_);

but SimdArray is not exported, so I can't bound on it.

How would you solve this type of problem (if possible)? I think my use case is not strange.

Thank you.

gnzlbg commented 5 years ago

You just need to create a trait, implement it for the SIMD type, and bound on that. If you implement it for two SIMD types, then you can use those two, etc.

gnzlbg commented 5 years ago

In other words, the problem is that:

type SIMDType;

can be any type, not necessarily a SIMD type. Nothing prevents some other impl from doing:

struct A;
impl SIMDeable for i32 {
  type SIMDType = A;
}

which is why when you try to write SIMDType::splat the compiler complains.

I'm closing this since it isn't really an issue with this library, but more like a question about how to use Rust generics in general.

but SimdArray is not exported, so I can't bound on it.

That's on purpose. If that trait were exported, you could implement it for your own types (e.g. impl SimdArray for A, and writing Simd<A> would type check but crash the compiler .

I don't think there is a way to export a trait such that it can be used in bounds, but such that it cannot be used to impl things, but anyways the SimdArray trait is an implementation detail of the library.

davenza commented 5 years ago

If someone is interested, I solved this with this code:

Code ```rust #![feature(specialization)] use std::ops::{Add, AddAssign}; use packed_simd::u32x8; // Function over generic parameter. pub trait Operation { fn f(a: V, b: &mut [V]) where V: Add + AddAssign + Clone; } // Dumb struct pub struct A(V); // Baseline implementation impl Operation for A { default fn f(a: V, b: &mut [V]) where V: Add + AddAssign + Clone { println!("Baseline implementation."); for e in b.iter_mut() { *e += a.clone(); } } } pub trait SIMDType { type PrimitiveType; #[inline(always)] fn from_slice_unaligned(slice: &[Self::PrimitiveType]) -> Self; #[inline(always)] fn write_to_slice_unaligned(self, slice: &mut [Self::PrimitiveType]); #[inline(always)] fn splat(a: Self::PrimitiveType) -> Self; #[inline(always)] fn lanes() -> usize; } impl SIMDType for u32x8 { type PrimitiveType = u32; #[inline(always)] fn from_slice_unaligned(slice: &[Self::PrimitiveType]) -> Self { Self::from_slice_unaligned(slice) } #[inline(always)] fn write_to_slice_unaligned(self, slice: &mut [Self::PrimitiveType]) { self.write_to_slice_unaligned(slice); } #[inline(always)] fn splat(a: Self::PrimitiveType) -> Self { Self::splat(a) } #[inline(always)] fn lanes() -> usize { Self::lanes() } } // Represents a SIMDeable type pub trait SIMDeable : Sized { type SIMDType: SIMDType + AddAssign; } impl SIMDeable for u32 { // I have AVX2 type SIMDType = u32x8; } // Optimized implementation impl Operation for A where V: SIMDeable, { fn f(a: V, b: &mut [V]) where V: Add + AddAssign + Clone { println!("SIMD implementation."); let mut remaining = b.len(); let mut i = 0; while remaining > V::SIMDType::lanes() { // Reading the slice let mut simd_slice = V::SIMDType::from_slice_unaligned(&b[i..(i+8)]); // Adding the constant simd_slice += V::SIMDType::splat(a.clone()); // Write the result to the slice. simd_slice.write_to_slice_unaligned(&mut b[i..(i+8)]); i += V::SIMDType::lanes(); remaining -= V::SIMDType::lanes(); } while remaining > 0 { b[i] += a.clone(); i += 1; remaining -= 1; } } } ```

I need two traits: SIMDeable to bound on the primitive type and SIMDType to associate each primitive type with the corresponding SIMD type. The disadvantage of this method is that I have to reimplement every method of the Simd struct in the SIMDType trait. I will, of course, use macros for that.


About the SimdArray issue, I am not a Rust expert, but ndarray does exactly that with the trait Data. You can see the comments in the documentation:

Note: Data is not an extension interface at this point. Traits in Rust can serve many different roles. This trait is public because it is used as a bound on public methods.

And in the __private__ function:

This trait is private to implement; this method exists to make it impossible to implement outside the crate.

This is just a constructive comment, not that I think this is completely necessary.

gnzlbg commented 5 years ago

The disadvantage of this method is that I have to reimplement every method of the Simd struct in the SIMDType trait.

Another disadvantage is that you can't implement the API "as is", e.g., you can't make some methods const fn (because they are trait methods), and you can't have "variadic" methods (e.g. u32x2::new(u32, u32) vs u32x4::new(u32, u32, u32, u32)) although you might be able to emulate those with arrays (e.g. SimdType::new([...])).

Also not all SIMD types have all methods (e.g. vectors of signed, unsigned, floats, masks, etc. all have different methods), so you can't probably mirror everything with one trait. Also, there is no guarantee that even all vectors of similar types will have all methods, e.g., u8xN might have a methods that other uMxN types don't have.

Some vector types are generic in nature (e.g. Simd<[*const *mut *const f32; 4]>) because they need to handle arbitrarily nested pointers, so you'll need blanket impls to handle those.

Also, some operations need const function arguments (e.g. shuffle!), so you probably can't use those via the traits - one can't even wrap them in a function.

IIRC we used to use traits a couple of years ago, but the limitations of traits were not worth the trouble (specialization is unsound, no const fn methods, lack of negative bounds or modalities, etc.). In hindsight, we could have tried a bit more to do better. If I were to try to expose a "generic" API for these types today, with the benefit of hindsight, I'll probably just use one trait per operation, e.g., trait SimdSplat, which only has the splat method, SimdExtract which only has the extract method, etc.

About the SimdArray issue, I am not a Rust expert, but ndarray does exactly that

I'm not very familiar with ndarray, but I suppose that they make Data public because it is possible to do useful things that are generic over it.

If we were to make SimdArray public, but unimplementable, you could write a T: SimdArray bound in user code, but there is no operation that you can perform on a Simd<T: SimdArray> since the impl<T: SimdArray> Simd<T> { } is empty. So you can at best "move" the type around, but to do that you don't need a bound, you can just use Simd<T>.

gnzlbg commented 5 years ago

This is just a constructive comment,

That's how I took it! It is definitely an interesting approach.

gnzlbg commented 5 years ago

FWIW I think it would make sense to expose a more generic way of handling vector types (e.g. using the many small traits approach that I mentioned above), and that's something that could live in packed_simd itself, but it would be something built "on top" of the concrete type approach, like how num::Int and num::Float traits are built "on top" of i32, f64, etc.

davenza commented 5 years ago

@gnzlbg Thanks! you did some useful comments (I didn't thought about the variadic methods). Your idea of multiple traits is probably much better than just exposing SimdArray.

I don't think that is possible to get all the benefits of specific types (e.g const fn) in generic code. However, sometimes, it might we preferable for some people to have generic code. For example, I will not use the benefits of const fun, but I would have to refactor all my (already) generic code to implement some optimizations with SIMD. Knowing the tradeoffs and having both possibilities is the way to go, IMHO. Right now, it is possible to use generic code (as I showed in the workaround I posted above), but I am sure we can do better.

Thanks, and good luck! I like the crate and the RFC. It is really great.


I just found what I think is a bug (I don't know if it is packed_simd or the compiler). The code I posted above works well. However, I was changing some code and I discovered that if I change this code in the trait:

    #[inline(always)]
    fn write_to_slice_unaligned(self, slice: &mut [Self::PrimitiveType]);

to

    #[inline(always)]
    fn write_to_slice_unaligned(&self, slice: &mut [Self::PrimitiveType]);

and of course, the implementation:

    #[inline(always)]
    fn write_to_slice_unaligned(&self, slice: &mut [Self::PrimitiveType]) {
        self.write_to_slice_unaligned(slice);
    }

The code compiles! but errors when this method is called:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)

This shouldn't happen in Rust. It is a bit strange because u32x8::write_to_slice_unaligned needs ownership, but the compiler let me compile a code where a borrow was calling to it. This makes me think that it might be compiler's fault.

Here is the complete buggy code with a main! ```rust #![feature(specialization)] use std::ops::{Add, AddAssign}; use packed_simd::u32x8; // Function over generic parameter. pub trait Operation { fn f(a: V, b: &mut [V]) where V: Add + AddAssign + Clone; } // Dumb struct pub struct A(V); // Baseline implementation impl Operation for A { default fn f(a: V, b: &mut [V]) where V: Add + AddAssign + Clone { println!("Baseline implementation."); for e in b.iter_mut() { *e += a.clone(); } } } pub trait SIMDType { type PrimitiveType; #[inline(always)] fn from_slice_unaligned(slice: &[Self::PrimitiveType]) -> Self; #[inline(always)] fn write_to_slice_unaligned(&self, slice: &mut [Self::PrimitiveType]); #[inline(always)] fn splat(a: Self::PrimitiveType) -> Self; #[inline(always)] fn lanes() -> usize; } impl SIMDType for u32x8 { type PrimitiveType = u32; #[inline(always)] fn from_slice_unaligned(slice: &[Self::PrimitiveType]) -> Self { Self::from_slice_unaligned(slice) } #[inline(always)] fn write_to_slice_unaligned(&self, slice: &mut [Self::PrimitiveType]) { self.write_to_slice_unaligned(slice); } #[inline(always)] fn splat(a: Self::PrimitiveType) -> Self { Self::splat(a) } #[inline(always)] fn lanes() -> usize { Self::lanes() } } // Represents a SIMDeable type pub trait SIMDeable : Sized { type SIMDType: SIMDType + AddAssign; } impl SIMDeable for u32 { // I have AVX2 type SIMDType = u32x8; } // Optimized implementation impl Operation for A where V: SIMDeable, { fn f(a: V, b: &mut [V]) where V: Add + AddAssign + Clone { println!("SIMD implementation."); let mut remaining = b.len(); let mut i = 0; while remaining > V::SIMDType::lanes() { // Reading the slice let mut simd_slice = V::SIMDType::from_slice_unaligned(&b[i..(i+8)]); // Adding the constant simd_slice += V::SIMDType::splat(a.clone()); // Write the result to the slice. simd_slice.write_to_slice_unaligned(&mut b[i..(i+8)]); i += V::SIMDType::lanes(); remaining -= V::SIMDType::lanes(); } while remaining > 0 { b[i] += a.clone(); i += 1; remaining -= 1; } } } fn main() { let mut float_vec = vec![1.0f64; 10]; A::::f(2.0f64, &mut float_vec); println!("Final f64 vec: {:?}", float_vec); let mut u32_vec = vec![1u32; 10]; A::::f(2u32, &mut u32_vec); println!("Final u32 vec: {:?}", u32_vec); } ```
gnzlbg commented 5 years ago

I think you might be just running into infinite recursion, which can overflow the stack. It isn't unsafe or anything, you are just running out of stack memory. Usually disambiguating the method call fixes it, e.g. from self.write_to_slice_unaligned(slice); to $PATH::write_to_slice_unaligned(self, slice); or similar, where $PATH is the path to the method that you want to call.

davenza commented 5 years ago

@gnzlbg That's true, fixed! Thank you for everything.

gnzlbg commented 5 years ago

Happy to help!