Tracking issue for RFC 2102, "Unnamed fields of struct and union type"

Centril commented 6 years ago

This is a tracking issue for the RFC "Unnamed fields of struct and union type" (rust-lang/rfcs#2102). The feature gate for the issue is #![feature(unnamed_fields)].

About tracking issues

Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

[ ] Implement the RFC (see https://github.com/rust-lang/rust/issues/49804#issuecomment-798768285 for mentoring instructions)
[ ] Adjust documentation (see instructions on forge)
[ ] Formatting for new syntax has been added to the Style Guide (nightly-style-procedure)
[ ] Stabilization PR (see instructions on forge)

Unresolved questions

None so far.

Implementation history

115131
115367
121198

joshtriplett commented 6 years ago

@Centril Regarding unresolved questions: the first three paragraphs in that section are just talking about the future possibility of full unnamed types that can appear everywhere, which would be the subject of a future RFC. The last paragraph talks about other future possibilities that also need their own RFCs.

Centril commented 6 years ago

@joshtriplett Yeah; that was also my interpretation, so ==> no unresolved questions?

cramertj commented 5 years ago

Looks like this still needs an implementation

igotfr commented 4 years ago

I think It, I hope it helps

RFC #2870

think it would be better that way

let unamedstruct: { count: i32, price: f64, type: u8 } = { count: 500, price: 6.4, type: 1 };

or

let unamedstruct: Struct = { count: 500, price: 6.4, type: 1 };

or simply

let unamedstruct = { count: 500, price: 6.4, type: 1 };

nothing to do with tuples, starting with tuples having no named elements, structs yes

joshtriplett commented 4 years ago

@cindRoberta That's an unrelated proposal. This RFC is about unnamed fields within other types, not fully general anonymous types.

ibraheemdev commented 3 years ago

Is there any movement on this, or is it just waiting for someone to implement it?

varkor commented 3 years ago

This is still waiting on an implementation.

ibraheemdev commented 3 years ago

@varkor maybe mentoring instructions would help someone get started?

varkor commented 3 years ago

I don't have time to write up full mentoring instructions at the moment, but if someone is interested in implementing this feature, I am happy to provide pointers for how to get started on Zulip.

The first place I'd look is parse_name_and_ty in compiler/rustc_parse/src/parser/item.rs, where you'll want to start parsing an underscore, along with the rest of the syntax for unnamed fields as described in the RFC.
This will entail modifying StructField in compiler/rustc_ast/src/ast.rs to contain one of two types: either a NamedStructField (containing most of what StructField currently contains) along with an UnnamedStructField, whose structure should be evident from the description in the RFC.
Having done this, you should gate the new feature by creating a new unnamed_fields feature gate in compiler/rustc_feature/src/active.rs, which should be added to INCOMPLETE_FEATURES, and then checking for UnnamedStructField in compiler/rustc_ast_passes/src/feature_gate.rs.
This should cover at least up to Parsing in the RFC.

jedel1043 commented 3 years ago

I'm interested on implementing this, but I need some mentoring to answer some questions I have.

jedel1043 commented 3 years ago

Ok, the parsing step has been completed. The next step is to lower the new types from the AST to the HIR. Could someone give me some recommendations on how to approach this?

joshtriplett commented 2 years ago

Is there still someone able to provide mentorship for implementing this?

Is there still someone available to try implementing this?

nwn commented 2 years ago

I would be interested to try implementing this, but would defer to the previous implementer if they are still interested.

jedel1043 commented 2 years ago

I would be interested to try implementing this, but would defer to the previous implementer if they are still interested.

Please do so! I'm somewhat busy nowadays, so I'd prefer leaving this work to someone who can dedicate their full attention to it :)

frank-king commented 1 year ago

Seems inactive for a while. I'd like to try implementing it (make sure that I won't duplicate @carbotaniuman's work if you are still actively working on it).

I may need help since I'm pretty new to rustc.

carbotaniuman commented 1 year ago

@frank-king I don't have time for a while, so feel free to take my code and use it. If you need any help with my changes you can ping me here or on zulip, but I'm not the most experienced with rustc in general.

frank-king commented 1 year ago

I think some unresolved problems are about the weak keyword union followed by an open delimiter { where union can also be parsed as an identifier.

I'm trying to enumerate the cases here, and please correct me or make a complement if there's something wrong or missing.

Cases that should pass the compilation (i.e. union should be treated as an identifier).

// `union` as a type identifier, empty `impl` item
impl union {}
// `union` as a type identifier, non-empty `impl` item
impl union {
 fn foo() {}
}
// `union` as a type identifier, empty `fn` body 
fn foo() -> union {}
// `union` as a type identifier, non-empty `fn` body 
fn foo() -> union {
let _: () = ();
}

Cases that should be recovered as an anonymous union.

// recovered non-empty anonymous union, missing `impl` body
impl union { field: u32 }
// recovered non-empty anonymous union, empty `impl` body
impl union { field: u32 } {}
// recovered non-empty anonymous union, non-empty `impl` body
impl union { field: u32 } {
fn foo() {}
}
// recovered empty anonymous union, empty `impl` body
impl union{} {}
// recovered empty anonymous union, non-empty `impl` body
impl union{} {
fn foo() {}
}
// recovered non-empty anonymous union, missing `fn` body
fn foo() -> union { field: u32 }
// recovered non-empty anonymous union, empty `fn` body
fn foo() -> union { field: u32 } {}
// recovered empty anonymous union, empty `fn` body
fn foo() -> union{} {}
// recovered empty anonymous union, non-empty `fn` body
fn foo() -> union{} {
let _: () = ();
}

Cases I'm not sure if we should recover

// syntactically correct, but probably never correct due to name conflicts
// I guess this case should be recovered so that we can report a name conflict error instead of a syntax error, is it?
impl union for union {}
// anonymous union can never be a trait
impl union {} for union {}
// but if we don't recover, the error might be confusing:
error: expected item, found keyword `for`
| 
LL | impl union {} for union {}
|               ^^^ expected item

carbotaniuman commented 1 year ago

Yeah, like 80% of the remaining issue with my PR was impl blocks, you can't parse anonymous unions eagerly in order not break compatibility, but you want to parse them as fallback to give nice errors. Add to that there's two places an anonymous union could or couldn't be in the impl block...

frank-king commented 1 year ago

I think an alternative way is to only recover the anonymous struct, but always treat union as a normal identifier when it comes to impl triats or types, or a function return types, leaving it a (possibly?) diagnostic issue and revisit it sometime in the future.

LcJuves commented 1 year ago

Hope to fix it soon and implement it, got rid of it, thanks a lot!

frank-king commented 1 year ago

Parsing is done in #115131, and now I'm going to implement the HIR to CodeGen part.

I don't know how to assign an AdtDef for an anonymous adt. Could anyone give me some hints?

CAD97 commented 11 months ago

For named types, I want to offer the potential alternative of using pattern destructuring as a potentially clearer alternative, e.g.

struct A { x: i32, y: i32 }
struct B {
    A { x, y }: A,
    z: i32,
}

(This would make struct S { union { … a pattern field.) I also note the RFC makes no mention of how unnamed fields of named structs should behave w.r.t. visibility of the inner struct's fields.

As an aside, a different reasonable alternative interpretation of _: Type as a field is to explicitly insert a Type’s worth of padding. It seems unnecessary to allow anonymous fields with named types. It doesn't even enable fixes to existing #[repr(C)] bindings to avoid compatibility breaks. IMO I think that use case is better served by a macro.

frank-king commented 9 months ago

@rustbot claim

RalfJung commented 9 months ago

Unresolved question: what should derive macros do here? This applies both to the built-in ones and user-defined ones. It seems like they all need major overhaul to support types like this. And it is pretty inevitable that people will ask for derive to be supported on these types, even if the MVP does not support them.

OTOH I assume many of them don't support unions to begin with, and these unnamed fields only really make sense when there are unions involved I think?

The RFC also explicitly lists anonymous types as a rejected alternative, and yet the implementation that recently began for this RFC does introduce anonymous ADTs to the compiler. Though maybe if it is impossible to write an expression of these types they are less problematic? That said I assume in the internal compiler IRs such expressions will exist -- the unnamed fields are getting an internal name and field accesses are desugared to use those names.

It's that kind of issue that makes me think that adding a new form of unnamed types to Rust (on top of closures/coroutines) is a mistake. The RFC was accepted 6 years ago, our approach to language design and evolution changed since then. I think we need to ensure that this is even still something we want to do in this form.

fmease commented 9 months ago

https://github.com/rust-lang/rust/issues/49804#issuecomment-1972619108

Nominating Ralf's comment for T-lang discussion. Context for T-lang: There's currently active compiler dev going on to implement this feature (several merged and open PRs by multiple contributors). I don't want them to continue working on it if it gets thrown out in the end.

Jules-Bertholet commented 9 months ago

The RFC also explicitly lists anonymous types as a rejected alternative, and yet the implementation that recently began for this RFC does introduce anonymous ADTs to the compiler.

I think this is the wrong implementation approach. The RFC doesn't specify the introduction of any new types. My expectation would be that this feature would be implemented by storing the field list in a tree structure instead of a flat list, for example in VariantDef.fields and in VariantData (ast, hir).

frank-king commented 9 months ago

Oh, previous implementation (#115732) of parsing unnamed fields as a special field kind made things very complicated and was rejected.

Was #115732 the right way to implement this feature?

RalfJung commented 9 months ago

If they give rise to "tree-shaped types" then should we enforce the tree to strictly alternate between struct and union, i.e., not accept struct fields inside a struct or union fields inside a union? Those would be redundant then and give rise to questions like whether they actually change how the layout is computed.

Also, that would require pretty fundamental changes to FieldsShape, which currently encodes the assumption that a type is either struct-like or union-like. This distinction has pretty fundamental consequences also for type validity, i.e. what the valid representations of a type are. So for instance

struct S {
  f1: bool,
  _: union { f2: u8, f3: u8 }
}

Should this entire type behave like a union (any byte sequence is valid), or should the first field still behave like it would inside a struct (must always be a valid bool)? The RFC doesn't even seem to say which fields can be safely accessed (if any) so it's not clear. If some fields can be safely accessed and others cannot, this would again be a fundamental language change, requiring the introduction of a novel kind of type to Rust.

All of this would be trivially resolved by allowing unnamed fields with named types only. Or, you know, just naming those fields; very little code has to interact with these complicated C types directly and they can just generate names for these fields and spell them out, making the traversal of this tree-shaped structure explicit. That would also make it much easier to reason about the code, e.g. by making it clear when two fields might overlap. Given that such types are already rare in C (people seem to always use the same 2 or 3 examples when motivating this feature), and writing C bindings is rare in Rust (in the sense that picking some random piece of Rust code, it's rare for that to be C binding code), I think a feature exclusively meant to support such a small fraction of code needs to pass a pretty high bar. The RFC does not seem to have a "prior work" section so I can only assume that thus far, no language besides C (and C++?) has this feature, further confirming my impression that it serves an extremely niche use-case.

afetisov commented 8 months ago

Should this entire type behave like a union (any byte sequence is valid), or should the first field still behave like it would inside a struct (must always be a valid bool)? The RFC doesn't even seem to say which fields can be safely accessed (if any) so it's not clear.

Doesn't it? Imho it's clear from the text, that the validity requirements are the same as if you had named fields with separately defined types. The only thing the RFC does is allow omitting explicit type definition and the corresponding field name in accesses:

Given a struct s of this type, code can access s.a, s.d, and either s.b or s.c. Accesses to a and d can occur in safe code; accesses to b and c require unsafe code, and b and c overlap, requiring care to access only the field whose contents make sense at the time.

Given a union u of this type, code can access u.a, or u.d, or both u.b and u.c. Since all of these fields can potentially overlap with others, accesses to any of them require unsafe code; however, b and c do not overlap with each other. Code can borrow u.b and u.c simultaneously, but cannot borrow any other fields at the same time.

Such a structure defined with repr(C) will use a representation identical to the same structure with all unnamed fields transformed to equivalent named fields of a struct or union type with the same fields.

should we enforce the tree to strictly alternate between struct and union..?

I hope not. This feature would allow structural inheritance for structs and unions. That feature would be useful e.g. for representation of C++-like class hierarchy, where subclasses have the fields of superclasses as their initial segments. For unions, it would allow patterns like ad-hoc subtyping of enums and hand-crafted ad-hoc sum types, like the ones OCaml uses to represent its errors.

RalfJung commented 8 months ago

I hope not. This feature would allow structural inheritance for structs and unions. That feature would be useful e.g. for representation of C++-like class hierarchy, where subclasses have the fields of superclasses as their initial segments. For unions, it would allow patterns like ad-hoc subtyping of enums and hand-crafted ad-hoc sum types, like the ones OCaml uses to represent its errors.

That's outside the scope and motivation of the RFC, and the feature is likely a poor fit for this purpose. IMO that's an argument in favor of requiring strict alternation.

afetisov commented 8 months ago

The scope of the RFC is representation of C FFI type without introducing types and field names which don't exist in original code. Anonymous structs can appear inside structs in C, same for unions, so that is explicitly within the scope and motivation of RFC.

simonbuchan commented 7 months ago

FWIW I offhandedly suggested a conflicting(?) interpretation of this syntax in IRLO a short while ago:

https://internals.rust-lang.org/t/phantom-trait/20434/15?u=simonbuchan

TLDR:

struct Foo<T> {
  id: u32,
  _: PhantomData<T>,
}

let foo: Foo<String> { id: 123 }

Where the compiler can implicitly initialize some sensible set of types (Default? Some new Unitary auto trait?) without them needing to be named.

Strictly I guess this use of the syntax doesn't conflict if such types are considered to be empty types somehow?

Amanieu commented 6 months ago

It seems that implementation work (#121553) is blocked on the lang team nomination, so it would be nice to clarify what the exact question being asked here is.

The libc 1.0 work (https://github.com/rust-lang/libc/issues/3248) is currently blocked on this feature since we would like to make the APIs match C more closely.

This is specifically useful for APIs where the field organization can vary between platforms, where some platforms use nested unions but others use distinct fields.

fmease commented 6 months ago

It seems that implementation work (#121553) is blocked on the lang team nomination, so it would be nice to clarify what the exact question being asked here is.

@Amanieu, cc https://github.com/rust-lang/rust/pull/121270#issuecomment-2025593169

Amanieu commented 6 months ago

I wasn't aware that this had already been discussed since this issue is still marked as lang-nominated. I know this was a while ago, but it would be nice if @compiler-errors or someone else could summarize the position of the lang team on this feature.

joshtriplett commented 6 months ago

It sounds like, from the minutes, that some folks would like to see this feature designed differently than it was when it was previously accepted. It wouldn't be the first or last feature to need some design adjustments when lang design met compiler reality. Happy to help with that, so that we can find a design that meets the requirements in the original RFC and any new issues that have arisen since then.

RalfJung commented 6 months ago

The libc 1.0 work (https://github.com/rust-lang/libc/issues/3248) is currently blocked on this feature since we would like to make the APIs match C more closely.

This is specifically useful for APIs where the field organization can vary between platforms, where some platforms use nested unions but others use distinct fields.

Note that with the proposal as specified in the RFC, this leads to the same code being safe on some platforms and unsafe on others.

IMO some C APIs are just too terrible for us to want to support matching them in Rust. I'm not saying we shouldn't support interop with those platforms -- we always should support interop. We added untagged unions, a major non-trivial language feature, largely to support C interop. So these APIs can be translated to named structs/unions. A consistent API across platforms can be achieved via methods. That's different from what happens in C, but it is not a good idea to copy mistakes (in my eyes) from C just because we want to have APIs that look the same. "It has to match the C API" is a great recipe for being forever stuck with poor choices made decades ago. IMO a new language feature needs stronger motivation than that.

rust-lang / rust