Closed gnzlbg closed 5 years ago
I don't see much reason for addressing vector loads and stores specifically in the memory model, IMO the default position is to treat them exactly like any other aggregate of same layout (e.g., [T; N]
). I can only think of two cases where it might make a difference:
1 relatedly, volatile, but that's all implementation defined anyway
Do you have any examples?
I just had ISPC local atomics in mind: https://ispc.github.io/ispc.html#atomic-operations-and-memory-fences but I haven't given much thought to how they are implemented. An example would be doing a vector scatter operation that writes two different values (two different elements of a vector) to the same memory location via the same pointer. The current RFC doesn't mention what happens in this case, but LLVM defines this as follows, so it might make sense to do the same thing:
Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.
Uninitialized memory.
That's a very good point. Given let v = u32x4::new(0, 1, mem::uninitialized(), 3);
does v.extract(0)
(which was initialized to 0
) return 0
or mem::uninitialized()
? The RFC does not mention anything about this, but it should.
I would expect that we would just treat vectors as just aggregates, meaning v.extract(0)
returns 0
, but I don't know how easy LLVM allows us to do this.
I just had ISPC local atomics in mind:
That seems to be a rather different matter. Correct me if I'm wrong, but this looks like it's mostly a consequence of the SPMD model (so ISPC needs to give semantics for "scalar" code in a way that makes it legal to vectorize with varying vector lengths). Since any vector operations in Rust are fully explicit in the source code (and evaluation order is generally fully defined), I don't think there's really any equivalent of the problem motivating "local atomics". We only need to be careful when specifying a few specific operations such as scatters – but that's again not automatically a memory model concern.
That's a very good point. Given let v = u32x4::new(0, 1, mem::uninitialized(), 3); does v.extract(0) (which was initialized to 0) return 0 or mem::uninitialized() ? The RFC does not mention anything about this, but it should.
I don't think the RFC should say anything about this, as it falls in the responsibility of the unsafe code guidelines. For now, code like your example there can be in the same state as practically everything else involving uninitialized memory: likely UB, details tbd.
I would expect that we would just treat vectors as just aggregates, meaning v.extract(0) returns 0, but I don't know how easy LLVM allows us to do this.
FWIW I've given this some more thought and from an optimizer perspective it seems very undesirable to have deferred UB (like poison or undef) in one lane infect the other lanes. It would make vectorization much harder. Using poison as an example for simplicity, you'd need to worry about one lane producing poison but not outputting it (so no UB), while another lane that wouldn't have produced poison is outputting its element of the vector (causing UB due to it being poison in the vectorized version, even though the scalar version was fine).
FWIW I've given this some more thought and from an optimizer perspective it seems very undesirable to have deferred UB (like poison or undef) in one lane infect the other lanes. It would make vectorization much harder. Using poison as an example for simplicity, you'd need to worry about one lane producing poison but not outputting it (so no UB), while another lane that wouldn't have produced poison is outputting its element of the vector (causing UB due to it being poison in the vectorized version, even though the scalar version was fine).
On one hand I think this also makes sense from the POV of newer hardware: the vector instructions of AVX-512, RISC-V, ARM SVE, ... almost always operate together with a mask, such that if an element of the vector is masked out, the operation is not performed for the element. That is, on newer hardware:
let m = m32x4::new(true, true, true, false);
let v = f32x4::new(1., 2., 3., uninitialized());
// adds only those lanes for which the mask is true:
let w = m.add(v, v); // uninitialized is masked out
On the other hand, on older hardware without support for vector masks, the most efficient thing would be to just add those uninitialized
together. Also, the current RFC provides masked operations via select. That is, the code I showed above is not valid, but the following is:
let m = m32x4::new(true, true, true, false);
let v = f32x4::new(1., 2., 3., uninitialized());
// adds only those lanes for which the mask is true:
let w = m.select(v, v + v);
which is weird if one element of v
is uninitialized because that element is unconditionally operated on from the pov of Rust source code even though the optimizer could do something different if the result is unused. I would prefer to not have to add all vector methods to the masks types, and have a more general mechanism like select instead.
Also, I just went through the whole MaybeUninitialized
RFC and I have no idea how to incorporate that into vector types. If mem::uninitialized
is deprecated and all elements of a vector must have the same type, what would be the correct way of setting one vector element as uninitialized?
On one hand I think this also makes sense from the POV of newer hardware: the vector instructions of AVX-512, RISC-V, ARM SVE, ... almost always operate together with a mask, such that if an element of the vector is masked out, the operation is not performed for the element. That is, on newer hardware:
You're mixing up hardware instructions and optimizer IR. There's no such thing as poison at the hardware level (Itanium's NaT aside). The two are only tangentially related.
But even besides that, LLVM IR currently has no such thing as "masked adds" (only a full vector add, followed by a vector blend) and even if it had, that approach wouldn't scale to everything, and even if it did, it would be really silly to materialize all these masks and extend their live ranges just to keep poison under control when the alternative (keeping poison lane-wise) is right there.
Also, I just went through the whole MaybeUninitialized RFC and I have no idea how to incorporate that into vector types. If mem::uninitialized is deprecated and all elements of a vector must have the same type, what would be the correct way of setting one vector element as uninitialized?
That seems a bit off topic for this issue (maybe bring it up on that RFC?) but my working assumption is MaybeUninitialized<TxN>
to transport entire vectors (e.g. a vectorized memcpy) and [MaybeUninitialized<T>; N]
to operate on individual elements that you know are initialized. (If you want to do lanewise operations on a partially-uninitialized vector, that would likely be UB anyway.)
If mem::uninitialized is deprecated and all elements of a vector must have the same type, what would be the correct way of setting one vector element as uninitialized?
I think I answered this later in https://github.com/rust-lang/rfcs/pull/2366#issuecomment-399063409, and I agree with @rkruppe : Rust doesn't do per-element tracking in arrays, so once you have a vector where not all elements are initialized, you have something like [MaybeUninit<T>; N]
. The type doesn't tell you which elements are (un)initialized, that's something you have to keep in your head manually at that point.
Currently the vector types are layout compatible with arrays, but it might make sense to me to make them layout compatible with certain tuples as well:
union U {
v: f32x4,
t: (f32, f32, f32, f32)
}
where the tuple has to:
T
, or MaybeUninit<T>
).That would allow us to use:
union U {
v: f32x4,
t: (f32, MaybeUninit<f32>, f32, u32)
}
for manipulating the values stored in the vector, including temporarily falling back to the tuple type to track that part of the vector might be uninitialized.
That seems to be a matter of guaranteeing the layout of some kinds of tuples, not anything to do with vectors in particular.
@gnzlbg are you fine with closing this as per @rkruppe's last comment (being subsumed by array layout discussions)?
I've decided to switch the portable vector types RFC to use the
ptr::{read,write}
terminology for consistency reasons instead of load and stores. I don't know if these have to be specified as an "extra" concept in the memory model or not, or whether the RFC should specify their semantics as a sequence of single elements reads and writes.It would be nice if those working on these parts of the memory model could take a look at the "Reads and Writes" section of the RFC and give me feedback.
Also, at some point we might want to add atomic vector reads and writes, so if these operations need to be specified in the memory model, that might be something worth keeping in mind.