rust-lang / portable-simd

The testing ground for the future of portable SIMD in Rust
Apache License 2.0
895 stars 80 forks source link

Support `MaybeUninit` vectors #286

Open Kixiron opened 2 years ago

Kixiron commented 2 years ago

Vectors of MaybeUninit values and operations over them (eg. Simd<MaybeUninit<T>, N> + Simd<MaybeUninit<T>, N>) would be incredibly useful for more advanced struct of array operations. This is mostly inspired by how Futhark handles sum types and is something that I actually have a use case for. The basic premise is that we hold and perform operations over potentially uninit values but only keep the values that are defined. A more concrete case of this would be in a struct of array Option type

struct SoaOption<T> {
    discriminants: Vec<bool>,
    values: Vec<MaybeUninit<T>>,
}

// The general operation we're performing is `Option::map(|x| x * 2)`, but over the entire array of options
let mut options: SoaOption<u32> = ...;
options.values.as_simd_mut().for_each(|x: MaybeUninit<u32>| x * Simd::splat(2));

// Then later when we're unpacking our struct of array form, we use our `discriminants` vector to tell us whether it's a `Some` or `None`
let mut unpacked: Vec<Option<u32>> = options
    .discriminants
    .into_iter()
    .zip(options.values)
    .map(|(discrim, x)| discrim.then(|| x.assume_init()))
    .collect();

This allows efficiently operating over SoA formed enums, which would be amazing

calebzulawski commented 2 years ago

At first glance this seems useful, and we have discussed uninitialized lanes to some extent.

There's certainly no consensus on what element types should be valid, but I believe the most common position is that it should only be number types and pointers. We haven't formalized this, but that would effectively mean that SimdElement implies all bit patterns are valid.

If that's the case, I think MaybeUninit would serve little purpose. You could have uninitialized lanes, but I don't think there would be any performance benefit vs arbitrarily initialized lanes, since most (if not all? not sure about RISC V V) instruction sets operate on all lanes, regardless. You could still use vectors for enums, but the underlying type would be integers.

Pinging @programmerjake, our resident unusual element type proponent

Kixiron commented 2 years ago

Uninitialized data is a little more subtle than just assuming all bit patterns are valid, if that were the case then we'd never need MaybeUninit<u8> for anything

programmerjake commented 2 years ago

imho having Simd<MaybeUninit<T>, N> would be quite useful. an example (check for matched parenthesis in read input bytes, accounting for shorter inputs):

const LANES: usize = 64;
// safe because the array element type MaybeUninit<u8> can be uninit
let mut buf: [MaybeUninit<u8>; LANES] = unsafe { MaybeUninit::uninit().assume_init() };
let res = unsafe { libc::read(fd, buf.as_mut_ptr().cast::<c_void>(), LANES) };
if res < 0 {
    todo!("error handling");
}
let len = res as usize;
const INDEXES: [usize; LANES] = {
    let mut retval = [0; LANES];
    let mut i = 0;
    while i < LANES {
        retval[i] = i;
        i += 1;
    }
    retval
};
let vec: Simd<MaybeUninit<u8>, LANES> = buf.into();
let lparens: Mask<MaybeUninit<i8>, LANES> = vec.lanes_eq(Simd::splat(MaybeUninit::new(b'(')));
let rparens: Mask<MaybeUninit<i8>, LANES> = vec.lanes_eq(Simd::splat(MaybeUninit::new(b')')));

// 1 for every `(`, `-1` for every `)`, `0` otherwise:
let byte_values: Simd<MaybeUninit<i8>, LANES> = rparens.to_int() - lparens.to_int();
let mask = Simd::from(INDEXES).lanes_lt(Simd::splat(len));
let byte_values = mask.select(byte_values, Simd::splat(MaybeUninit::new(0)));
let sum: MaybeUninit<u8> = byte_values.reduce_sum();
let sum = unsafe { sum.assume_init() };
assert_eq!(sum, 0, "mismatched parenthesis");

and yes, MaybeUninit is needed if you want uninitialized vector elements because the compiler treats uninitialized as something different than any normal value you could put in T:

e.g. (to be clear, this is UB because a in f is uninitialized): https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=c7ebaea680ecad3f5545cff19bc5ff61

use std::mem::MaybeUninit;

unsafe fn f(v: &mut u64, v2: &mut u64) {
    let a: u64 = MaybeUninit::uninit().assume_init();
    *v = a;
    *v2 = a;
}

fn main() {
    let mut a = 0;
    let mut b = 1;
    unsafe { f(&mut a, &mut b); }
    assert_eq!(a, b);
}

currently prints:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/main.rs:13:5

which isn't something that could happen for any normal value of a in f.

workingjubilee commented 1 year ago

@Kixiron What we're finding is that a lot of vector ISAs offer trivial zeroing for predicated loads (which Rust doesn't currently support, but we can add) and predicated gathers (which are mandatory), so while we're investigating this use-case with interest, we're struggling to come up with real examples that will profit a lot from this in terms of actual compilation results. Is there something we're missing here?

programmerjake commented 1 year ago

@workingjubilee one other note: Simd<MaybeUninit<T>, N> matches LLVM IR's internal semantics, so requiring zeroing everywhere instead of undef will make it much harder for LLVM to optimize due to all the extra select ops.

Kixiron commented 1 year ago

There's also no way to emulate things like ARM's svldff1 instruction without MaybeUninit, meaning that things like this (Implementing 'strlen' using SVE, Blog Post) would be impossible to implement in Rust

calebzulawski commented 1 year ago

While I agree that particular operation produces uninit lanes--that also doesn't look to me like anything that's particularly easy to implement portably (I'm not even sure if it's possible).

programmerjake commented 1 year ago

While I agree that particular operation produces uninit lanes--that also doesn't look to me like anything that's particularly easy to implement portably (I'm not even sure if it's possible).

the portable version would be something like:

pub unsafe fn strided_load_fault_first<T, const MAXVL: usize>(
    ptr: *const T,
    stride_in_bytes: isize, // can be negative
    mask: Mask<T, MAXVL>,
    mut vl: usize,
) -> (Simd<MaybeUninit<T>, MAXVL>, usize) {
    let mut retval = Simd::<MaybeUninit<T>, MAXVL>::uninit();
    assert!(vl <= MAXVL);
    for i in 0..vl {
        let element_ptr = ptr
            .cast::<u8>()
            .wrapping_offset(i * stride_in_bytes)
            .cast::<MaybeUninit<T>>();
        let mut stop = mask.test(i) && read_faults(element_ptr);
        // nondeterministic() is always true on arches without fault first,
        // avoiding the need for read_faults
        stop |= nondeterministic();
        stop &= i != 0;
        if stop {
            vl = i;
            break;
        }
        if mask.test(i) {
            retval[i] = ptr::read_unaligned(element_ptr);
        }
    }
    (retval, vl)
}
RalfJung commented 1 year ago

We have to be very careful with exposing LLVM's poison-propagating arithmetic semantics. Right now the soundness of Rust's compilation to LLVM relies on the fact that a UB-free Rust program will never have an LLVM poison value anywhere ever. This is crucial for us since LLVM treats poison on a per-value level and Rust's initialization tracking is on a per-byte level, so if a poison ever happens it could "infect" neighboring bytes in ways that are unsound.

So, doing anything like this is unfortunately blocked on LLVM having a "byte" type bN of arbitrary size that can represent bytewise poison semantics.

And that's leaving aside my own more subjective objections to poison-style arithmetic -- I could be convinced but I need solid motivating examples. Also see the Zulip discussion here.