rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
650 stars 57 forks source link

Layout of multi-dimensional arrays #277

Open Michael-F-Bryan opened 3 years ago

Michael-F-Bryan commented 3 years ago

I noticed the Layout of Rust array types and slices chapter from the Unsafe Code Guidelines doesn't mention anything about the layout of multi-dimensional arrays.

Can it be assumed that a [[T; N]; M] will have the same layout as a [T; N*M]?

I believe you can infer it to be true based on the first three paragraphs, but I thought I'd double-check because it doesn't explicitly mention anything about multi-dimensional arrays.

RalfJung commented 3 years ago

Multi-dimensional arrays are not a primitive concept -- in [[T; N]; M], Rust first lays out U := [T; N] and then lays out [U; M]. So the behavior of multidimensional arrays follows from the behavior of the pieces (as you would expect with a proper, compositional language). This is just like there's no chapter about "nested struct" or "putting an enum in a union; if we start to spell out all possible ways of nesting types, we'll never get done. ;)

RalfJung commented 3 years ago

Can it be assumed that a [[T; N]; M] will have the same layout as a [T; N*M]?

Currently, this actually does not seem to be guaranteed, due to this sentence: "The alignment of array types is greater or equal to the alignment of its element type." Notice the 'greater or equal'. If [T; N] had an alignment greater than T, that would affect the layout of [[T; N]; M].

I'm not an expert on data type layout, so I don't know why it says 'greater or equal'.

Lokathor commented 3 years ago

possibly to leave room for automatic simd conversion.

LunarLambda commented 8 months ago

I would assume that (for a repr(C) T), for a 2D-array in Rust ([[T; N]; M]) to be laid out the same as a 2D-array in C (T[N][M]), that is, inside-out. That said, I don't know that C guarantees any equivalence between T[N][M] and T[N * M] either.

It does say, for _Alignof (C11): If type-name is an array type, the result is the alignment requirement of the array element type.. So in practice in C (though not explicitly stated) this property holds (it would apply recursively for nested arrays). Unsure about Rust's "greater or equal" language then. Though most simply put, Rust arrays are still repr(rust), not repr(C).

From the wording in the reference, it sounds like, either it depends whether [T; N] is repr(C) if T is:

  1. if yes, then [[T; N]; M] is equivalent to C's T[N][M] and by the recursive alignment property in C, equivalent to T[N*M]
  2. if no, [T; N] is not itself repr(C), and neither is [[T; N]; M].

It's ambiguous, in my opinion.

It should also be added that in any case, C does not allow this kind of type-reinterpretation, it's UB but by the language's own rules does work.

chorman0773 commented 8 months ago

[T;N] is repr(C) regardless of whether T is - it's layout algorithm is well-defined. Each T is placed in memory immediately after the other. @RalfJung's previous comment then applies this inductively, to show that MdArray<T,N...> where MdArray<T,N0,N1,...Nm> is [[[[T;N0]; N1]; ...]; Nm] has a well-defined layout algorithm.

This is regardless of Ts layout.

CAD97 commented 8 months ago

We could key off of T's repr(C)-ness for whether alignment is required to be equal or allowed to be greater, but that seems just opaque rather than useful.

Since we have repr(simd), I think we should just change the guarantee to require that alignment of [T; N] is the same as T for all N, not just N=1. This might already be constrained to be true by the &[T] -> &[T; N] conversion provided by std.