rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
660 stars 57 forks source link

Should niches/ABI be part of the layout of a type? #122

Closed gnzlbg closed 1 year ago

gnzlbg commented 5 years ago

The current definition of layout (https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#layout) does not consider "niches" part of the type layout.

In this thread (https://github.com/rust-lang/unsafe-code-guidelines/pull/120#discussion_r276975529) it was argued that maybe we might want to change that and make them part of the layout of a type.

If we do that, we need to change the glossary, and distinguish that &mut T and *mut T don't have the same layout, because they don't have the same "niches".

cc @eddyb

gnzlbg commented 5 years ago

@RalfJung Niches can have at least two different sources: invalid representations, which are part of validity (e.g. in &T), and padding, which is certainly a part of layout (e.g. in (u8, u16)).

The current definition of layout in the glossary includes padding by omission (e.g. by saying that field offsets are part of layout), but does not explicitly mention it. Maybe we should explicitly mention padding in layouts definition, and mention that padding bits introduce niches in the representation.

gnzlbg commented 5 years ago

So I think that *mut T and &mut T should have the same layout - what they don't have is the same validity invariant.

This would narrow this question to whether (u8, u16) and (u8, u8, u16) should have the same layout. They do not have the same niches, but all bit-patterns are valid for both types.

Lokathor commented 5 years ago

It would be a very useful property if two types with the same layout, when used as the concrete type of a generic type, produced the same layout of the overall type. In other words, Option<T> has the same concrete layout for any two concrete types you put as T as long as those two types have the same layout as each other.

I think this makes for a rule that is very easy to understand and teach. I don't think there's any optimization possibilities lost.

However, if we want to have such a rule then niche needs to count as part of the layout.

gnzlbg commented 5 years ago

All representations are valid for padding bytes, so they cannot introduce niches.

RalfJung commented 5 years ago

Niches can have at least two different sources: [...] and padding

All representations are valid for padding bytes, so they cannot introduce niches.

Just to be clear: the latter is correct, the former is not. Padding cannot be used for niches.

invalid representations, which are part of validity

That is indeed the justification for a niche. But not every ruled-out bit pattern is a niche. For example, 0x1 is not a valid bit pattern for &u32, and yet it is (currently) not part of the niche. Hence I think we have to consider niches and validity as separate terms. They are connected by a "soundness theorem" saying that all values in the niche are never valid for the type.

However, if we want to have such a rule then niche needs to count as part of the layout.

Yes, that's basically my point. We need some name for "all the things of T that we need to know when computing the size and alignment of Wrapper<T>". Those things are size, alignment and niche. (Field offsets are not part of it though!)

So, what do we call that thing? "Layout" seems like a reasonable term. So "layout" would, by definition, consist of the size, alignment and niche of a type.

But there are other things that are relevant for a type in this context, that are also sometimes to be considered to be included in "layout", namely the function call ABI and the offsets of the fields (if the type has any). In particular, TyLayout in rustc includes this additional data. But this means that e.g. u32 and Option<NonZeroU32>, while having the same (size, align, niche), don't have the same TyLayout.

So maybe (size, align, niche) should be called something else, to capture precisely the property @Lokathor was mentioning? Or maybe TyLayout should be renamed to TyLayoutAndAbiAndFields (working title)?

RalfJung commented 5 years ago

What seems really odd though is to include ABI and fields, but exclude the niche. I think that is just an oversight.

So my proposal is to update the docs to include "niche" in the definition of "layout". Or does anyone have a case where talking about (size, align, fields, abi) and excluding the niche is useful? @gnzlbg you seem to have that in mind when suggesting that &mut T and *mut T should be considered to have the same layout.

That still leaves open the question of how to call (size, align, niche), though.

I would certainly not include the validity invariant in whatever a layout is, that's way more information than we actually need. Abstracting it to a "niche" the way rustc does is useful I think.

hanna-kruppe commented 5 years ago

I would prefer to exclude "ABI" (meaning how it is passed by value, not the more general sense of Application Binary Interface) from "layout":

That is, I propose "layout = (size, align, fields, niche)". This would maybe entail renaming TyLayout but ¯\_(ツ)_/¯ IMO it's a misnomer anyway.

I also think it's fine that this is more information than is needed for "computing the layout of Wrapper<T> from the layout of T". If the distinction is important we can make up a term like "layout shape" or "layout without field offsets" for it (or just spell out the inputs to the layout computation in more detail), but for most of things we discuss in terms of "layout" (e.g., whether type punning is ok, whether layout computation is deterministic, etc.) the field offsets do potentially matter.

RalfJung commented 5 years ago

whether layout computation is deterministic

We'd definitely want to include the ABI in that one though.

whether type punning is ok

Well, type punning is okay between Option<NonZeroU32> and u32 even though they don't have the same "field offsets" (the latter doesn't even have any fields).

hanna-kruppe commented 5 years ago

We'd definitely want to include the ABI in that one though.

Sure, just say "ABI and layout is deterministic [w.r.t. ...]".

Well, type punning is okay between Option and u32 even though they don't have the same "field offsets" (the latter doesn't even have any fields).

Yes, there is lots of type punning between types that aren't comparable wrt fields, and there's also other things (e.g., validity and safety) invariants to keep in mind when type-punning. Field offsets are just part of the story.

RalfJung commented 5 years ago

I feel it makes sense to exclude fields, because then equality of layout is a necessary condition for type punning of T inside an arbitrary Wrapper<T> to make any sense.

Turns out the docs already have a definition of layout, and it doesn't agree with the glossary: https://doc.rust-lang.org/stable/reference/type-layout.html says

The layout of a type is its size, alignment, and the relative offsets of its fields.

So this does not include ABI, nor niche. I think this is unlike anything that any one of us has been proposing. ;)

hanna-kruppe commented 5 years ago

I feel it makes sense to exclude fields, because then equality of layout is a necessary condition for type punning of T inside an arbitrary Wrapper to make any sense.

I can understand that, but there are many incompatible substitutions for X, Y in "if we define layout as X then propety Y can be stated concisely in terms of layout". I don't know how to resolve that, arguing about which is more important seems miserable and unlikely to help.

But I don't have very strong opinions about most of the definition anyway. I am very serious about this one point though: specification terms that Rust users recognize from informal/pre-formal discussion should bear some resemblance to this informal/preexisting meaning. From that angle, "layout" absolutely must include field offsets. After all, where fields are located is a major part of how the type is laid out in memory, and fiddling with that is a large source of layout optimizations.

I care less about whether niche and ABI are in or out, definitely not enough to argue at length about it. Users think about those comparatively rarely. But telling users that #[repr(C)] struct Foo(u8, u32); and #[repr(C)] struct Bar(u32, u8); "have the same layout" is just plain misleading.

RalfJung commented 5 years ago

But telling users that #[repr(C)] struct Foo(u8, u32); and #[repr(C)] struct Bar(u32, u8); "have the same layout" is just plain misleading.

That's fair. I think you convinced me that field offsets should be included, also for consistency with existing docs.

So the contentious points seem to be whether ABI and/or niche are included -- and if not, what we call the thing that includes them as well.

The current definition of layout in the glossary includes padding by omission (e.g. by saying that field offsets are part of layout), but does not explicitly mention it. Maybe we should explicitly mention padding in layouts definition, and mention that padding bits introduce niches in the representation.

IMO it is pretty clear that when you define where the fields are, that also defines the gaps between the fields, i.e., padding. Once the field offsets are fixed, there is no freedom left for where to put padding. But it might make sense to state this explicitly.

gnzlbg commented 5 years ago

I feel it makes sense to exclude fields, because then equality of layout is a necessary condition for type punning of T inside an arbitrary Wrapper to make any sense.

What do you mean by type punning?

The alignment of T and Wrapper<T> does not need to match for mem::transmute to be ok - only their size needs to match. So AFAICT, one can type pun types of different sizes, alignment, niches, and fields, depending on how one does the type punning. The requirements for the type punning will depend on what the particular API requires, but "layout equality" is probably a too strong requirement.

Lokathor commented 5 years ago

People don't just pun with transmute, they also pub with slice casting stuff. I actually had to put it in a crate recently, link, because people kept wanting to do it but getting it wrong.

However, as you say, even a slice cast doesn't require layout equality.

eddyb commented 5 years ago

I would prefer to exclude "ABI" (meaning how it is passed by value,

Nit: I think we can maybe refer to this as "call ABI"? As in, the ABI of a given type wrt calling conventions?

RalfJung commented 5 years ago

I'd love that. "ABI" is such an overloaded term...

Ixrec commented 5 years ago

For the reader semi-confused by the phrase "the ABI w.r.t. calling conventions": is "call ABI" actually different from "calling convention"? Or are they just synonymous? Or is one of them a superset of the other?

eddyb commented 5 years ago

@Ixrec It's... complicated. One way to look at it is that each calling convention takes in argument/return types' "call ABI" and lowers that to passing those argument/return values in registers and/or the stack.

E.g. the "call ABI" of (i32, ()) and that of i32 are the same (scalar, more specifically a 32-bit integer, signed), which means that every calling convention (SysV, Windows stdcall, etc.) must treat them the same.

It gets trickier with an aggregate "call ABI", because some calling conventions introspect the layout, even recursing through the fields (x86_64 SysV being the most complex AFAIK).

I guess the confusing bit is I could say "call ABI (of a type)" and "call ABI (of a platform)", the latter being more or less a "calling convention" (but I'm less sure here).

There's also the more gnarly distinction between what LLVM lowers itself, and what the frontend has to lower, but I would think we'd paper over that in this context.

gnzlbg commented 5 years ago

@Ixrec

For the reader semi-confused by the phrase "the ABI w.r.t. calling conventions": is "call ABI" actually different from "calling convention"?

They are different. The "calling convention" is an agreement between the caller (of a function) and the callee (the function) about how to interface, for example, the function arguments (amongst many other things). Both need to agree on this.

In Rust, most functions follow the Rust calling convention, but you can also choose extern "C" fn.., extern "fastcall" fn..., etc.

These calling conventions classify the types that you can pass around into different categories, e.g., in some calling conventions, an i32 function argument is a SCALAR and a struct Foo { ... } is an AGGREGATE. Depending on the category, the calling convention might define that the argument is passed in register X, or that it must be put on the stack frame, or passed in some other way.

These categories is what is meant here by "call ABI" of the type, and the same type can be a SCALAR in one calling convention, and an AGGREGATE in another. This is fine: the calling convention is part of the function type so both the caller and the callee agree on the category.

People often want to use, e.g., wrappers like struct Wrapper(i32); when interfacing with C functions that expect an i32. This does not work "as is" because i32 and Wrapper(i32) can have a different "call ABI" (their category is not necessarily the same). Applying repr(transparent) to Wrapper gives it the same category as i32 solving this problem.

eddyb commented 5 years ago

These categories is what is meant here by "call ABI" of the type, and the same type can be a SCALAR in one calling convention, and an AGGREGATE in another

This is inaccurate, or at least misleading, as we have a scalar/aggregate distinction that the calling convention can't contest: it might still pass an aggregate in registers or a scalar on the stack, but it can't tell apart struct Wrapper(i32); from i32 (with or without repr(transparent)) - only repr(C) makes that an aggregate (which e.g. the x86_64 SysV calling convention will still pass in a register).

RalfJung commented 5 years ago

@eddyb proposed in private conversation to use "ABI" (instead of layout) as the overarching term here, and then categorize that into memory ABI (= layout?), call ABI, and maybe more.

gnzlbg commented 5 years ago

@eddyb proposed in private conversation to use "ABI" (instead of layout) as the overarching term here, and then categorize that into memory ABI (= layout?), call ABI,

I like that.

What would "memory ABI" be? Just size+align? If so, I don't think we should call that "layout". The term is a bit overloaded, and we use it, e.g., in the context of "layout optimizations", which do apply to "niche"s as well.

Also, where do "niche"s go there ? I suppose we could have a "value ABI" that includes padding+niches, where the difference is that "niche"s can be used for layout optimization while "padding" cannot. Maybe we need a better word for this than "padding", and this could also tie nicely with "value representation".

gnzlbg commented 5 years ago

For example, the "value ABI" for bool (for all currently supported platforms) could be that bool has no padding, and can only take the values 0 and 1 - everything else is a "niche".

eddyb commented 5 years ago

I suppose we could have a "value ABI"

One way to be more precise about this is to talk about "memory" vs "immediate".

As for what "memory ABI" would be: size, align, field offsets and "largest niche" offset/range. You could implement an entirely compliant Rust compiler with it except for FFI calling conventions.

RalfJung commented 5 years ago

The issue with these "X ABIs" is that often people will say just "ABI" when they mean "call ABI". At least that's my experience.

But otherwise, I do like this proposal. Maybe we also need a "nesting ABI" that includes the niche, whereas "memory ABI" only includes size + alignment?

RalfJung commented 4 years ago

I think this has not been mentioned in this issue yet, quoting from https://github.com/rust-lang/unsafe-code-guidelines/pull/153 (but said meeting was more than a year ago):

The conclusion in the meeting was that we should avoid the term "layout" in the reference, and instead define in the glossary which components a layout can have, and then always spell that out explicitly, probably with an abbreviation like "they have the same SAN-layout" [size, alignment, niche].

JakobDegen commented 1 year ago

Closing as a duplicate in favor of #304 .