vector reads and writes

gnzlbg commented 6 years ago

I've decided to switch the portable vector types RFC to use the ptr::{read,write} terminology for consistency reasons instead of load and stores. I don't know if these have to be specified as an "extra" concept in the memory model or not, or whether the RFC should specify their semantics as a sequence of single elements reads and writes.

It would be nice if those working on these parts of the memory model could take a look at the "Reads and Writes" section of the RFC and give me feedback.

Also, at some point we might want to add atomic vector reads and writes, so if these operations need to be specified in the memory model, that might be something worth keeping in mind.

hanna-kruppe commented 6 years ago

I don't see much reason for addressing vector loads and stores specifically in the memory model, IMO the default position is to treat them exactly like any other aggregate of same layout (e.g., [T; N]). I can only think of two cases where it might make a difference:

Atomics¹, as you mention. However, here is only matters if we're talking about an atomic load or store of a whole vector (as opposed to element-wise atomics). I don't know of any architecture that supports that, other than the special case where vectors are as small as a scalar for which atomics are supported (e.g. SIMD-within-a-register) – but programmers can already exploit that by type punning. Do you have any examples?
Uninitialized memory. Treating vectors as just another aggregate would presumably mean you can have a vector with some elements intialized and well defined while others are uninitialized, whereas if vectors are "primitives" of the memory model, it's conceivable that loading a vector from a partially-initialized memory region has the same effect as if it was completely uninitialized. I don't think this matters a lot for Rust code either way, but the former (more liberal) option might be incompatible with what LLVM does. I think it currently isn't, and it isn't under the PLDI '17 proposal, but it might be wiser to be more conservative here in case LLVM changes it.

¹ relatedly, volatile, but that's all implementation defined anyway

gnzlbg commented 6 years ago

Do you have any examples?

I just had ISPC local atomics in mind: https://ispc.github.io/ispc.html#atomic-operations-and-memory-fences but I haven't given much thought to how they are implemented. An example would be doing a vector scatter operation that writes two different values (two different elements of a vector) to the same memory location via the same pointer. The current RFC doesn't mention what happens in this case, but LLVM defines this as follows, so it might make sense to do the same thing:

Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.

Uninitialized memory.

That's a very good point. Given let v = u32x4::new(0, 1, mem::uninitialized(), 3); does v.extract(0) (which was initialized to 0) return 0 or mem::uninitialized() ? The RFC does not mention anything about this, but it should.

I would expect that we would just treat vectors as just aggregates, meaning v.extract(0) returns 0, but I don't know how easy LLVM allows us to do this.

hanna-kruppe commented 6 years ago

I just had ISPC local atomics in mind:

That seems to be a rather different matter. Correct me if I'm wrong, but this looks like it's mostly a consequence of the SPMD model (so ISPC needs to give semantics for "scalar" code in a way that makes it legal to vectorize with varying vector lengths). Since any vector operations in Rust are fully explicit in the source code (and evaluation order is generally fully defined), I don't think there's really any equivalent of the problem motivating "local atomics". We only need to be careful when specifying a few specific operations such as scatters – but that's again not automatically a memory model concern.

That's a very good point. Given let v = u32x4::new(0, 1, mem::uninitialized(), 3); does v.extract(0) (which was initialized to 0) return 0 or mem::uninitialized() ? The RFC does not mention anything about this, but it should.

I don't think the RFC should say anything about this, as it falls in the responsibility of the unsafe code guidelines. For now, code like your example there can be in the same state as practically everything else involving uninitialized memory: likely UB, details tbd.

I would expect that we would just treat vectors as just aggregates, meaning v.extract(0) returns 0, but I don't know how easy LLVM allows us to do this.

FWIW I've given this some more thought and from an optimizer perspective it seems very undesirable to have deferred UB (like poison or undef) in one lane infect the other lanes. It would make vectorization much harder. Using poison as an example for simplicity, you'd need to worry about one lane producing poison but not outputting it (so no UB), while another lane that wouldn't have produced poison is outputting its element of the vector (causing UB due to it being poison in the vectorized version, even though the scalar version was fine).

gnzlbg commented 6 years ago

FWIW I've given this some more thought and from an optimizer perspective it seems very undesirable to have deferred UB (like poison or undef) in one lane infect the other lanes. It would make vectorization much harder. Using poison as an example for simplicity, you'd need to worry about one lane producing poison but not outputting it (so no UB), while another lane that wouldn't have produced poison is outputting its element of the vector (causing UB due to it being poison in the vectorized version, even though the scalar version was fine).

On one hand I think this also makes sense from the POV of newer hardware: the vector instructions of AVX-512, RISC-V, ARM SVE, ... almost always operate together with a mask, such that if an element of the vector is masked out, the operation is not performed for the element. That is, on newer hardware:

let m = m32x4::new(true, true, true, false); 
let v = f32x4::new(1., 2., 3., uninitialized());
// adds only those lanes for which the mask is true:
let w = m.add(v, v); // uninitialized is masked out

On the other hand, on older hardware without support for vector masks, the most efficient thing would be to just add those uninitialized together. Also, the current RFC provides masked operations via select. That is, the code I showed above is not valid, but the following is:

let m = m32x4::new(true, true, true, false); 
let v = f32x4::new(1., 2., 3., uninitialized());
// adds only those lanes for which the mask is true:
let w = m.select(v, v + v);

which is weird if one element of v is uninitialized because that element is unconditionally operated on from the pov of Rust source code even though the optimizer could do something different if the result is unused. I would prefer to not have to add all vector methods to the masks types, and have a more general mechanism like select instead.

Also, I just went through the whole MaybeUninitialized RFC and I have no idea how to incorporate that into vector types. If mem::uninitialized is deprecated and all elements of a vector must have the same type, what would be the correct way of setting one vector element as uninitialized?

hanna-kruppe commented 6 years ago

On one hand I think this also makes sense from the POV of newer hardware: the vector instructions of AVX-512, RISC-V, ARM SVE, ... almost always operate together with a mask, such that if an element of the vector is masked out, the operation is not performed for the element. That is, on newer hardware:

You're mixing up hardware instructions and optimizer IR. There's no such thing as poison at the hardware level (Itanium's NaT aside). The two are only tangentially related.

But even besides that, LLVM IR currently has no such thing as "masked adds" (only a full vector add, followed by a vector blend) and even if it had, that approach wouldn't scale to everything, and even if it did, it would be really silly to materialize all these masks and extend their live ranges just to keep poison under control when the alternative (keeping poison lane-wise) is right there.

hanna-kruppe commented 6 years ago

Also, I just went through the whole MaybeUninitialized RFC and I have no idea how to incorporate that into vector types. If mem::uninitialized is deprecated and all elements of a vector must have the same type, what would be the correct way of setting one vector element as uninitialized?

That seems a bit off topic for this issue (maybe bring it up on that RFC?) but my working assumption is MaybeUninitialized<TxN> to transport entire vectors (e.g. a vectorized memcpy) and [MaybeUninitialized<T>; N] to operate on individual elements that you know are initialized. (If you want to do lanewise operations on a partially-uninitialized vector, that would likely be UB anyway.)

RalfJung commented 6 years ago

If mem::uninitialized is deprecated and all elements of a vector must have the same type, what would be the correct way of setting one vector element as uninitialized?

I think I answered this later in https://github.com/rust-lang/rfcs/pull/2366#issuecomment-399063409, and I agree with @rkruppe : Rust doesn't do per-element tracking in arrays, so once you have a vector where not all elements are initialized, you have something like [MaybeUninit<T>; N]. The type doesn't tell you which elements are (un)initialized, that's something you have to keep in your head manually at that point.

gnzlbg commented 6 years ago

Currently the vector types are layout compatible with arrays, but it might make sense to me to make them layout compatible with certain tuples as well:

union U {
   v: f32x4,
   t: (f32, f32, f32, f32)
}

where the tuple has to:

have the same size and alignment as the vector,
have the same number of elements as the vector,
and each tuple element must have the same size and alignment as a vector element (we can make this stringer, e.g., say each tuple element's type must either be the type of the vector element T, or MaybeUninit<T>).

That would allow us to use:

union U {
    v: f32x4,
    t: (f32, MaybeUninit<f32>, f32, u32)
}

for manipulating the values stored in the vector, including temporarily falling back to the tuple type to track that part of the vector might be uninitialized.

hanna-kruppe commented 6 years ago

That seems to be a matter of guaranteeing the layout of some kinds of tuples, not anything to do with vectors in particular.

RalfJung commented 5 years ago

@gnzlbg are you fine with closing this as per @rkruppe's last comment (being subsumed by array layout discussions)?

rust-lang / rust-memory-model

vector reads and writes #48