rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.14k stars 12.69k forks source link

Hide `#[repr(transparent)]` where the field is non-public #90435

Closed jhpratt closed 1 year ago

jhpratt commented 3 years ago

Per @dtolnay in #72841:

Note that repr(transparent) is not the same as permission to inspect or manipulate the internal representation of a type. That permission only exists if the field is also exposed as pub.

As such, we should consider hiding #[repr(transparent)] when the field is not otherwise visible. This should not be the case when --document-private-items is passed for reasons that should be obvious.

This will affect some stdlib structs such as NonZeroU8

@rustbot label +A-docs +C-enhancement +T-rustdoc

BGR360 commented 2 years ago

Is there consensus that this is a good thing to do? I understand that the thing that @jhpratt wanted to do with PathBuf was advised against, but for types that already have repr(transparent) like NonZeroU8, why shouldn't it be advertised? Is there no scenario where a user both relies on the fact that a type is repr(transparent) and doesn't do anything "impermissible" with it?

jhpratt commented 2 years ago

The reason I was advised against it is because the attribute is only meaningful if you are able to know what the inner type is. While something like NonZeroU8 is logically represented as u8, there's nothing that strictly days that's the case. This is the quote I included.

The #[repr(transparent)] doesn't give a user sufficient promises to do anything. So why show it at all?

BGR360 commented 2 years ago

The #[repr(transparent)] doesn't give a user sufficient promises to do anything.

I mean, that's not true, is it? It makes some very explicit promises. Consider UnsafeCell. We have tons of code where I work that wraps up thread-safe C types in UnsafeCell so they can be accessed from multiple threads in Rust. When those types cross the FFI boundary, it seems critical for there to be a guarantee that the memory layout will be the same so we can transmute a value or reference to the C type into its Rust equivalent:

struct ffi_foo;

#[repr(transparent)]
pub struct Foo {
    inner: UnsafeCell<ffi_foo>,
}

impl Foo {
    pub fn go_kazoo(&self) {
        unsafe { ffi_foo_go_kazoo(self.inner.get()) }
    }
}

impl<'a> From<&'a ffi_foo> for &'a Foo {
    fn from(x: &'a ffi_foo) -> Self {
        // SAFETY: types have identical layout and ffi_foo is thread-safe
        unsafe { std::mem::transmute(x) }
    }
}

Maybe that example is misguided, but it still stands that repr(transparent) gives meaningful guarantees, right?

jhpratt commented 2 years ago

Ignoring the fact that the example you provided is unsound, I see no reference to the inner type of UnsafeCell in its documentation. As a result I'm deferring to the comment by @dtolnay who knows more about this than me. He's indicated that barring the field being public it cannot be relied upon.

BGR360 commented 2 years ago

Which part is unsound?

Ignoring the details of my example, though, here's the Nomicon regarding repr(transparent):

The goal is to make it possible to transmute between the single field and the struct. An example of that is UnsafeCell, which can be transmuted into the type it wraps.

Also, passing the struct through FFI where the inner field type is expected on the other side is guaranteed to work. In particular, this is necessary for struct Foo(f32) to always have the same ABI as f32.

dtolnay commented 2 years ago

Yeah @jhpratt is correct. Just a repr(transparent) attribute does not give any public-facing guarantee when the field isn't pub.

Individual data structures may document a commitment that their layout may be publicly relied on, but that is simply not repr(transparent). Currently this kind of thing is just written in prose but this came up as well in https://github.com/rust-lang/unsafe-code-guidelines/issues/302#issuecomment-913019124 about whether there should be some machine-readable way.

BGR360 commented 2 years ago

Okay I see what you're saying. To put it in my own words, the set of all repr(transparent) types in the standard library can be split into two categories: "publicly transparent" (e.g., UnsafeCell, NonZero*) and "incidentally transparent" (e.g., PathBuf).

The unfortunate thing is that this split is currently not documented very well if at all. And as shown above, the Nomicon describes repr(transparent) as being equivalent to "publicly transparent." So at the very least, it seems that the Nomicon needs to be amended if this change is made.

But given that the Nomicon is written how it is, it seems possible that there are already crates that elide any additional prose and rely solely on the presence of repr(transparent) in their rustdocs to communicate that a type is "publicly transparent." So if this change is made without notification, perhaps existing documentation will lose information.

dtolnay commented 2 years ago

I disagree that the nomicon's description refers to publicly transparent. It's describing guarantees that the language makes to the module/crate that contains the type. Nothing in there is about what guarantees the module/crate makes to its downstream users.

BGR360 commented 2 years ago

Hm, maybe not explicitly. But when I read it, it seems strongly implied.

The goal is to make it possible to transmute between the single field and the struct. An example of that is UnsafeCell, which can be transmuted into the type it wraps.

Nothing on that page, nor in the parent page, nor even in the Reference page on layouts, suggests to me that these rules only apply to the module/crate where the type is defined. My own intuition additionally tells me that the layout of a type is a property that is true for any crate in which the type is used.

Perhaps there's some detail that I'm missing when I read it, but really, if this is an important distinction, then it should be made more clearly. I'm more than happy to have my misconceptions ironed out through discussion, but I would wager that other noobs like me will walk away thinking the exact same things given the current text, and that's the real problem to me.

jhpratt commented 2 years ago

I would wager that other noobs like me will walk away thinking the exact same things given the current text, and that's the real problem to me.

To me that's precisely why this issue was created and should be resolved.

riking commented 2 years ago

I agree that nobody has any business attempting to transmute a PathBuf, but the non-pub nature of structures like NonZeroU8 is necessary to uphold the invariants - otherwise safe code could simply write a 0 to the field.

There will need to be some other attribute - possibly under rustdoc:: - to indicate whether the choice of inner type is stable.

CAD97 commented 2 years ago

Incidentally related: #[repr(C)] has similar pitfalls in documentation, if not all fields are public, and can be actively misleading since it reorders // some fields omitted to the end, e.g. https://github.com/rust-lang/rust/issues/66401

IOW: I think any #[repr] attribute should be hidden if any private fields are present. Library-stable layout is different from language-stable layout.

Fryuni commented 2 years ago

I'm in favor of this change. repr(transparent) guarantees that the struct has the same layout of its field, it makes no guarantees about the type of said field.

For example, seeing something like this in the docs:

#[repr(transparent)]
pub struct Foo<T> { /* private fields */ }

Does not guarantee that transmute<Foo<bool>, bool>(foo) is valid because the struct definition could be this:

#[repr(transparent)]
pub struct Foo<T> {
  content: Vec<T>,
}

repr(transparent) states that a struct has the same layout as something else. The docs should state what that something else is if that could be relied on. I could change Foo to hold a VecDeque and it would not be a breaking change if I never guaranteed that the layout would be that of a Vec.

A quick search through GH yields some examples of this happening: https://github.com/Tamschi/ref-portals/blob/7baf0352b988364729b119af69475080013a23fa/src/rc.rs#L48-L51 https://github.com/eiz/eiz/blob/03a555fe4c0a251c5b1f8fa3d247ed99ad10af12/src/com.rs#L36-L37 https://github.com/aidangilmore/opaque-ke/blob/d1dfee9a6545d7dfddc6dd94f985f31b3221c54e/src/keypair.rs#L159-L160

I think any #[repr] attribute should be hidden if any private fields are present.

I think this might be too radical. repr(align(n)) doesn't have this problem, it is there to solve it actually. Without it, the structure has at least the largest alignment between the fields, but if the fields are private you cannot rely on what that value is. If the structure has repr(align(64)) the consumers can rely on it having at least 64-bit alignment regardless of things like generic parameters. I'd also argue that people (not me, just speculating) working on really tight constraints might be relying on the niche-filling of enums, so repr(no_niche) should also be public.

RalfJung commented 2 years ago

Nothing on that page, nor in the parent page, nor even in the Reference page on layouts, suggests to me that these rules only apply to the module/crate where the type is defined. My own intuition additionally tells me that the layout of a type is a property that is true for any crate in which the type is used.

They apply globally, if you pin your dependency. But nothing in any of those documents says anything about how library authors are allowed to evolve their type definitions while being semver-compliant. It's perfectly semver-compatible to, for example, remove the repr(transparent) from PathBuf.

So yes, the layout of a type is a property that is true anywhere, but how that layout may change in the future looks very different from inside a module/crate and from the outside, and repr(transparent) says nothing about it.

kpreid commented 1 year ago

IOW: I think any #[repr] attribute should be hidden if any private fields are present.

Note that privacy is not sufficient to cover all cases where #[repr] would ideally be hidden. For example, suppose I have this enum (which, as any enum, has fully public contents):

#[repr(u8)]
pub enum Color {
    BLACK  = 0b00;
    RED    = 0b01;
    BLUE   = 0b10;
    PURPLE = 0b11;
}

I might want to make use of the bit-patterns of this enum for some algorithm within the crate, but not guarantee to users that the discriminant values are stable. (Though this is not solely about repr; there's also the detail that in that case I'd like to prohibit as-casting the enum to integer.)

And getting back to repr(transparent), here's a funky case where a repr(transparent) struct with all public fields, and all the same field types, can still change its representation, if ZSTness is changed:

mod v1_0 {
    pub struct Foo(());
    pub struct Bar(i32);

    #[repr(transparent)]
    pub struct Murky {
        pub foo: Foo,
        pub bar: Bar,
    }
}

mod v1_1 {
    pub struct Foo(i32);
    pub struct Bar(());

    #[repr(transparent)]
    pub struct Murky {
        pub foo: Foo,
        pub bar: Bar,
    }
}

v1_0::Murky and v1_1::Murky are semver-compatible (if they are from different versions of the same crate) but the change will break code that assumes transparency allows converting a Murky to a Foo, or a Bar to a Murky.

None of this says that rustdoc shouldn't hide #[repr(transparent)] with private fields; just that it is not sufficient to make that the only way to hide reprs.

the8472 commented 1 year ago

Imo one should have to explicitly opt into making any repr part of the documented API.

BurntSushi commented 1 year ago

I generally agree with "explicitly opt into making any repr part of the documented API" and also the weaker variant "a repr shouldn't be shown by default if the fields are private." But:

Note that repr(transparent) is not the same as permission to inspect or manipulate the internal representation of a type. That permission only exists if the field is also exposed as pub.

@dtolnay I don't think I quite agree with this, although it might just be down to wording. IMO, we should endeavor to guarantee that, for example, NonZeroU8 is repr(transparent) for u8 even though we would never want to make its inner field public. Otherwise it's unsound to do thing like zero cost casts from &[NonZeroU8] to &[u8] (and vice versa for when you know none of the u8sare0`).

RalfJung commented 1 year ago

IOW: I think any #[repr] attribute should be hidden if any private fields are present. Library-stable layout is different from language-stable layout.

In a repr(transparent) if the non-ZST field is public, it should be okay to have private 1-ZST fields, right? The public guarantee is that of layout compatibility with the non-ZST field, after all.


@rust-lang/rustdoc what would it take to fix this, and at least have the standard library add repr(transparent) to a type without that showing up in the docs? We are at this point desperate enough to resort to #[cfg_attr(not(doc), repr(transparent))], but then the transmutes are still wrong in doc test builds so that's not a great solution either, though it is better than no repr.

RalfJung commented 1 year ago

Actually seems like I was wrong about doctests, and #[cfg_attr(not(doc), repr(transparent))] could be a quite reasonable approach to handle this until something more systematic appears.

fmease commented 1 year ago

I can look into fixing this issue unless you are happy with using #[cfg_attr(not(doc), repr(transparent))] until we have “#[pub_repr(transparent)]” (i.e., language support for public and private reprs) if that's what you mean by “something more systematic”?

@rustbot claim

RalfJung commented 1 year ago

It'd probably require some design, I was thinking of #[pub(repr(...))] or is repr the only attribute that rustdoc will show? However that would probably run into annoying grammar issues just like https://github.com/rust-lang/rfcs/pull/3325.

fmease commented 1 year ago

is repr the only attribute that rustdoc will show?

As far as I can tell, #[repr] and #[non_exhaustive] are the only inert attributes that rustdoc shows (verbatim).

I was thinking of #[pub(repr(...))]

From a technical standpoint, it shouldn't matter what the final syntax looks like (as long as rustc can handle it, rustdoc can), I just picked something for illustrative purposes.


While working on a patch, I discovered the following corner case: What should happen if all fields are 1-ZSTs? Should we hide #[repr(transparent)] if all of them are private and show it if there exists a public 1-ZST field? Actually I was quite surprised to see rustc accept such an ADT, I thought at least one non-ZST field had to present (Rust reference).

RalfJung commented 1 year ago

Yeah that corner case is a good question. If all fields are 1-ZST then the type is a 1-ZST but the question is, it that a stable guarantee or not?

Maybe for now we should apply the easier rule, if there is any private field then don't show repr. We can always fine-tune this later. It's also easier for people to add a sentence saying "this type is guaranteed to be layout- and ABI-compatible with that other type" than to say "despite what rustdoc shows, this type is not guaranteed to be a transparent wrapper around X".

Nemo157 commented 1 year ago

I think we could simplify the rule even more: don't show repr, we can't know if it's meant to be public. There can be cases where you have only public fields but apply repr for internal usage only without a semver guarantee.

The main crates this would impact are -sys ones where they do have many publicly repr(C) structs. Those are also mostly auto-generated so it wouldn't take much for bindgen to add in a sentence on the generated docs. If someone wants to have rustdoc show the attribute then they could propose a new feature to provide a semver guaranteed attribute. (EDIT: This is related to "signalling [layout] safety and stability" from project-safe-transmute, but might only need the "stability" part which I don't think they give any way to signal).

Most of the other exposed attributes except non_exhaustive this same sort of reasoning applies to; no_mangle, export_name and link_section may be intended to be public guarantees or only for internal usage and we have no way to tell.

GuillaumeGomez commented 1 year ago

In some context, it's quite useful to know if a struct is repr(transparent) or repr(c) to know if I can just pass a &/&mut on it directly or not.

jsha commented 1 year ago

I think we could simplify the rule even more: don't show repr, we can't know if it's meant to be public.

Yes, I agree with this (and had been meaning to write up a reply proposing it, so thanks for posting)!

Rather than go in the direction of more complexity, I'd like to go in the direction of more simplicity, and this can help simplify both rustdoc and people's mental model of how to write rustdoc. For crates where the representation is meant to be a stable part of the public API, I think it is reasonable to explicitly document that fact, either on a per-item basis or at the top level (e.g. "all types in this crate are repr(C)" or "these numeric types are repr(transparent) and correspond to the underlying types you would expect; here are the safety invariants you must uphold."

Manishearth commented 1 year ago

As someone who deals with repr a bunch I do kinda like the current default but would prefer ways to control doc-publicity. I don't think just "not showing it" is simpler, because I think it is quite useful to be able to have that in your public API in an easily digestable way rather than sticking it in the docs.

RalfJung commented 1 year ago

In some context, it's quite useful to know if a struct is repr(transparent) or repr(c) to know if I can just pass a &/&mut on it directly or not.

But that's the thing -- just because it is repr(C) now doesn't mean you an do that, because the library reserves the right to change this in the future. The presence of a repr is not always considered a promise from libraries to their users.

Manishearth commented 1 year ago

Right, I'm saying that libraries should be able to make that promise and have it show up nicely in the docs.

In particular when reviewing unsafe code it is super common for me to want to check "is this repr(C)" and it would be nice for libraries to be able to prominently display that on their type if they so choose.

GuillaumeGomez commented 1 year ago

Just like @Manishearth said.

EDIT: The semver situation in rust is very frustrating: if you change the repr, you can be sure there are people who are relying on it and that it'll break their code. repr changes not being considered as a breaking change is a nightmare.

Manishearth commented 1 year ago

repr changes not being considered as a breaking change is a nightmare.

I think the real problem is that there is no way to opt in or out of this beyond writing docs, I bet @obi1kenobi would also be happier if we had clearer signals about how public a repr is.

For some crates, they should be considered breaking, for others they shouldn't, this is a hard problem and I think the solution is probably a rustdoc feature.

GuillaumeGomez commented 1 year ago

Not only. rustdoc is supposed to mirror what is considered useful or not to the users. If repr changes are not breaking changes, I understand others' position who want to remove it (even if that'd make my work more complicated when using C bindings).

Manishearth commented 1 year ago

rustdoc is supposed to mirror what is considered useful or not to the users

Yes, and if the crate author decides that the repr is not something they wish users to rely on, the repr is no longer considered useful to the user.

even if that'd make my work more complicated when using C bindings

Almost certainly crates doing C binding stuff would want their reprs to be public and considered breaking.

jsha commented 1 year ago

Right, I'm saying that libraries should be able to make that promise and have it show up nicely in the docs.

Yes, agreed on this. Proposal:

  1. Don't show the #[repr(...)] attribute at the top of the declaration.
  2. Add a #[doc(representation = "stable")] attribute.
  3. That attribute would trigger an item-info that says "The layout of this struct|union|enum is stable across minor versions and follows the #[repr(C)] rules."
  4. A doc lint would reject #[doc(representation = "stable")] on items that aren't repr(C) or repr(transparent) (with a repr(C) at the bottom).

Reasoning: we have a principle that "docs contain things that look like code," but the further we deviate from "the exact code as written," the weaker that principle gets. Since we need to hide the repr attribute in some scenarios (non-public field), rather than behave in a way that is hard to deduce from just reading the docs, we should eliminate the ambiguity.

Also, from a design perspective, rendering of attributes is problematic. Following the "docs contain things that look like code" principle, they need to be at the top. But also they are less important than the declaration itself, so we de-emphasize them by making them grey. By using an item-info we get more flexibility to order things by their relative importance, and style them as we wish.

Elaborating on #3 further: I think a documented stable representation also implies (a) no fields will be added, removed, or reordered, including private fields, and (b) the types of all fields also have a documented stable representation. Letting crate authors explicitly specify that they intend a stable representation would probably be a boon to tools like cargo-semverver, since they can make stronger checks.

Fryuni commented 1 year ago

I think a documented stable representation also implies (a) no fields will be added, removed, or reordered, including private fields.

A #[repr(transparent)] may add, remove and reorder private ZST fields without breaking the representation guarantee, so that should not be a constraint to consider it stable.

the types of all fields also have a documented stable representation

Probably can only be said about non-generic fields, right? Otherwise it would require the user to only use the type with other stable-representation types.

obi1kenobi commented 1 year ago

repr changes not being considered as a breaking change is a nightmare.

I think the real problem is that there is no way to opt in or out of this beyond writing docs, I bet @obi1kenobi would also be happier if we had clearer signals about how public a repr is.

For some crates, they should be considered breaking, for others they shouldn't, this is a hard problem and I think the solution is probably a rustdoc feature.

repr being possibly not public is news to me 😬 Is this something that there's broad consensus on?

If so, it might be good to write that explicitly into the new cargo semver guidelines that are being considered, since they currently define repr(C) as always public and even explicitly name removing repr(C) as an example of a major breaking change: https://github.com/rust-lang/cargo/pull/12169/files#diff-2a1bf33ec2bf9e43fefbe83c09ec9f69d5b9f501380ed37003de13c848c1b286R255-R256

Apologies for adding further complexity to this already-thorny issue. I don't particularly have any horse in this race, I just want the decisions and docs to be in sync everywhere so I know what rules cargo-semver-checks should follow :)

Manishearth commented 1 year ago

@obi1kenobi no, I think the current situation is that they are always considered public, informally

but I think it is valuable to allow library authors to choose that behavior with an attribute that hints to rustdoc and semver-checks

obi1kenobi commented 1 year ago

Ah got it. That makes sense, thanks for clarifying :+1:

notriddle commented 1 year ago

Yeah, that’s how it sounds to me, too. It makes sense for rustdoc to make an educated guess by default, as long as the criteria is simple enough to describe in one sentence in the rustdoc book, and offers a way to explicitly turn it on or off if the guess is wrong.

Manishearth commented 1 year ago

Anyway I think there is not much harm fixing the issue as stated, and we can figure out a principled approach without breaking stuff in the future.

I think we should only do this for transparent and not C, since it is useful knowing if an opaque struct is repr(C)

CAD97 commented 1 year ago

It's pretty clear-cut with #[repr(transparent)] that the transparent-inner type being hidden makes #[repr(transparent)] useless to downstream.

For #[repr(C)], there's a separate (and older) issue also tracked: https://github.com/rust-lang/rust/issues/66401

RalfJung commented 1 year ago

The current situation is that some people consider then always public but others do not. See e.g. https://github.com/rust-lang/rust/issues/90435#issuecomment-980506714 So no you cannot just assume they are public when you see them in a random library and the type has private fields. That's already the case today. rustdoc just does not reflect that.

BurntSushi commented 1 year ago

It's pretty clear-cut with #[repr(transparent)] that the transparent-inner type being hidden makes #[repr(transparent)] useless to downstream.

Can you elaborate a bit more on this? I can interpret this sentence in a couple different ways and I'm not sure what the right parse is. :)

Manishearth commented 1 year ago

The current situation is that some people consider then always public but others do not

Yes, agreed, but I think informally enough of the ecosystem relies on this for repr(C) that it's more or less the case that removing repr(C) is a breaking change. repr(transparent) is indeed murkier. Personally I would say that it ought to be something you can rely on for layout purposes but that does not necessarily imply transmutability.

fmease commented 1 year ago

It makes sense for rustdoc to make an educated guess by default, as long as the criteria is simple enough to describe in one sentence in the rustdoc book, and offers a way to explicitly turn it on or off if the guess is wrong.

Anyway I think there is not much harm fixing the issue as stated, and we can figure out a principled approach without breaking stuff in the future.

Okay, so what if we – for the time being – show #[repr(transparent)] iff the non-ZST field is public or if at least one 1-ZST field is public in case all fields are 1-ZST fields (one sentence for the book)? The user can use #[cfg_attr(not(doc), repr(transparent))] to completely opt out of repr(transparent). Since we aren't gonna introduce a new doc attribute for the time being, there won't be a way to explicitly opt into repr(transparent) in order to “overwrite” this rule. I argue that this is completely fine since this situation only ever arises if the user wants to consider a type with a private non-ZST field or one where all fields are 1-ZSTs & private to be repr(transparent) which doesn't make sense to me.

In pseudo code:

fields.find(is_non_zst)
    .map_or_else(|| fields.any(is_pub), is_pub)

I can spin up a PR relatively quickly if you like this approach.

If you consider this rule too complicated, I can also just go for repr(transparent) iff all fields are public but since there's currently no way to explicitly opt into repr(transparent), there will be many more false positives which the user can't do much about.

Manishearth commented 1 year ago

This seems fine.

#[cfg_attr(doc, repr(transparent))] to completely opt out of repr(transparent)

I assume you mean not(doc) here

RalfJung commented 1 year ago

#[cfg_attr(not(doc), repr(transparent))] turns out to have a problem -- when rendering re-exported items from dependent crates, the attribute does show up. (See e.g. std::ffi::CStr.)

Fryuni commented 1 year ago

#[cfg_attr(not(doc), repr(transparent))] turns out to have a problem -- when rendering re-exported items from dependent crates, the attribute does show up. (See e.g. std::ffi::CStr.)

That sounds like a bug to me... I mean, this would also break the documented intended use case of the doc cfg. Would it not?
Re-exporting a platform-specific type and using the any(platform, doc) trick would break my docs since the upstream type would not exist without the cfg being passed down.

Is there an issue for this, or, if this is the intended behavior, is there any discussion/explanation on why and an issue to add it to the docs?

Totally caught me by surprise.

notriddle commented 1 year ago

Yeah, it's bad. It falls out of the way cross-crate inlining and compiling works.

Fixing it is actually going to be pretty complicated. I couldn't find a bug report for it (not very many people use cfg(doc) and not very many people use cross-crate inlining, so I suppose they don't hit it much).

https://github.com/rust-lang/rust/issues/114952

RalfJung commented 1 year ago

@fmease wrote

when working on a fix for https://github.com/rust-lang/rust/issues/90435 (which is now sadly blocked on https://github.com/rust-lang/rust/issues/114952 I guess), ...

I don't think this is blocked on #114952. Just the easy hack of using #[cfg_attr(not(doc), repr(transparent))] sadly doesn't entirely work, so a proper implementation in rustdoc might be needed.