rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.38k stars 12.72k forks source link

Tracking Issue for `#![feature(offset_of)]` #106655

Closed WaffleLapkin closed 9 months ago

WaffleLapkin commented 1 year ago

Feature gates tracked by this issue:

This is a tracking issue for the offset_of! macro which evaluates to a constant containing the offset in bytes of a field inside some type (https://github.com/rust-lang/rfcs/pull/3308).

Public API

// core::mem

pub macro offset_of($Container:ty, $field:tt $(,)?) {
    // ...implementation defined...
}

Steps / History

Possible future extensions / work

Unresolved Questions

tgross35 commented 1 year ago

It looks like https://github.com/rust-lang/rust/pull/106934 is the implementation PR, you may be able to link it in the readme

est31 commented 1 year ago

Some things that could get tests:

Edit: #111665

Amanieu commented 1 year ago

Small nit on the docs: I don't think "stable" is the right word to use for describing the output on non-repr(C) structs since this causes confusion with Rust's concept of stable APIs.

@rfcbot fcp merge

rfcbot commented 1 year ago

Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

m-ou-se commented 1 year ago

Looks like it has an issue: 0.0 doesn't work right now for nested tuple fields.

rfcbot commented 1 year ago

:bell: This is now entering its final comment period, as per the review above. :bell:

WaffleLapkin commented 1 year ago

@Amanieu do you suggest to stabilize offset_of? Could you clarify what is the FCP for? It seems a bit too soon, as this feature was just recently implemented.

m-ou-se commented 1 year ago

This FCP is to confirm that libs-api is happy with committing to this macro existing forever with this interface. That seems refreshingly uncontroversial in this case.

Whether the implementation is robust/bug free enough is a bit of a separate question, and it might be worth delaying stabilization for that.

If you think there is value in having more time for the interface/api itself, please share your thoughts on that!

WaffleLapkin commented 1 year ago

Oh I see! I don't think we are committing to the interface until stabilization, but yes, the interface seems uncontroversial 👍🏻

est31 commented 1 year ago

One nice project before stabilization would be to make a PR to memoffset to use offset_of internally behind a cfg or feature flag, then test that together with some users of memoffset. Maybe it could be integrated into a crater run in some fashion.

BurntSushi commented 1 year ago

I wanted to double check that the lang team was okay with us making the call on an API like this, and indeed, they did respond on the RFC saying they were comfortable with it being in libs-api's hands: https://github.com/rust-lang/rfcs/pull/3308#issuecomment-1331189141

BurntSushi commented 1 year ago

@rfcbot concern docs

The documentation for this macro is woefully incomplete. I would expect it to be full of interesting bits about what guarantees are provided and which aren't. I'd love to see someone comb through the RFC discussion at https://github.com/rust-lang/rfcs/pull/3308 and extract the salient and interesting points, then convert them somehow into API docs for this macro. I would pay special attention to anything @RalfJung had to say on the matter. :-)

I also find the existing docs somewhat confounding. The reference to C's offsetof is only going to help C programmers. I'm okay with references like that personally, but I think they need to be properly contextualized such that anyone who isn't familiar with C's offsetof will understand what is meant.

BurntSushi commented 1 year ago

@rfcbot concern implementation quality

I like @est31's idea above about putting this through its paces before we commit to it. I don't see anything changing about the API, but it doesn't feel right to stabilize something that hasn't seen some kind of use in anger yet.

Amanieu commented 1 year ago

I tried converting all uses of memoffset::offset_of with core::mem::offset_of in one of my projects and ICE'd the compiler: #111657

est31 commented 1 year ago

I've filed #111669 for linting if someone doesn't use the result of offset_of. The macro is side effect free. I don't think this should be a stabilization blocker however, it's only a minor thing.

Edit: also filed https://github.com/Gilnaa/memoffset/pull/72 for adding a memoffset feature that uses builtin offset_of!(), as described above, and also filed #111665 for adding more tests, which led to the discovery of ICE #111678.

est31 commented 1 year ago

Some updates in the past days:

oxalica commented 1 year ago

Looks like it has an issue: 0.0 doesn't work right now for nested tuple fields.

Since #112216, offset_of!(NestedTuple, 0.0) is now accepted. It looks super suspicious and confusing to me. It is also not covered by this tracking issue nor the RFC. Is there more existing discussion about it?

est31 commented 1 year ago

As pointed out in #112204, 0.0 already works for field access as well, so (((), ()),()).0.0 is a valid Rust expression. The RFC does mention nested field access as one of the extensions, which were implemented by the original PR https://github.com/rust-lang/rust/pull/106934. My PR #112216 only touched the parser to recognize float-like tokens as their field-like components.

oxalica commented 1 year ago

As pointed out in #112204, 0.0 already works for field access as well, so (((), ()),()).0.0 is a valid Rust expression.

That's different than offset_of!(Ty, 0.0), which looks exactly a float literal as an argument, but have different unrelated meaning. I can parse field access as expr.0.0 as expr .0 .0, but not ty, 0.0 as ty ,0 .0. I prefer to let them have more similar surface syntax, by also putting a dot before the first field offset_of!(Ty, .0.0), or the same but blessed by rustfmt offset_of!(Ty, .0 .0). I hope enforcing a . before field accesses could merge the parser logic between two field access syntax for consistency.

m-ou-se commented 1 year ago

I can parse field access as expr.0.0 as expr .0 .0

(Note that that's not how the Rust compiler sees it:

macro_rules! m {
    ($a:tt $b:tt $c:tt) => {
        println!("{:?} {:?} {:?}", stringify!($a), stringify!($b), stringify!($c));
    };
}

fn main() {
    m!(expr.0.0); // prints: "expr" "." "0.0"
}

)

joshlf commented 1 year ago

Small doc nit:

Note that the output of this macro is not stable, except for #[repr(C)] types.

There are other reps that guarantee type layout for structs/unions (namely, transparent and packed/packed(1)).

joshlf commented 1 year ago

IIUC from this thread, the outstanding issues blocking FCP are:

Is that correct?

deltragon commented 1 year ago

There was also a discussion about syntax on zulip, and making sure that the basic syntax does not lead to issues for the extended case: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/.60offset_of!.60.20Syntax (see also https://github.com/rust-lang/rust/pull/114208 for that)

GKFX commented 1 year ago

Small doc nit:

Note that the output of this macro is not stable, except for #[repr(C)] types.

There are other reps that guarantee type layout for structs/unions (namely, transparent and packed/packed(1)).

This has been updated to the below, linking out to the documentation on the subject rather than trying to summarise it all in place.

Note that type layout is, in general, subject to change and platform-specific.

joshlf commented 1 year ago

I've put up a PR to address https://github.com/rust-lang/rust/issues/106655#issuecomment-1550041499: https://github.com/rust-lang/rust/pull/117512

BurntSushi commented 1 year ago

Given the testing work @est31 has put in (and a DM from them noting that they think this concern can be resolved), I'll resolve the implementation quality concern. :-)

@rfcbot resolve implementation quality

BurntSushi commented 1 year ago

There is also a fair bit of discussion around the syntax here on Zulip: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/.60offset_of!.60.20Syntax

Is there a summary of that discussion anywhere? cc @scottmcm

I'll mark a concern for it just in case.

@rfcbot concern syntax

est31 commented 1 year ago

Thanks @joshlf for the docs PR. Thanks to that PR and your work on #111839 I think we are closer to stabilization than ever.

Currently I see three blockers for stabilization:

Regarding the formalities, there is the ongoing FCP in this thread and this comment said that the FCP is about the interface of the macro. I'm not sure what that entails, whether it's a proposal for full stabilization, stabilization with the caveat that implementation quality must be good, or something more limited. Would it be possible to clarify that?

the8472 commented 1 year ago

We should also comb through the standard library and maybe the compiler to find places to dog-food this.

GKFX commented 1 year ago

We should also comb through the standard library and maybe the compiler to find places to dog-food this.

I think std::intrinsics::option_payload_ptr is now redundant to offset_of - it could be removed to start using offset_of instead.

BurntSushi commented 1 year ago

@rfcbot resolve docs

est31 commented 1 year ago

We should also comb through the standard library and maybe the compiler to find places to dog-food this.

FTR there was already #112298, but that doesn't mean it is the end of such efforts :).

joshlf commented 1 year ago
  • if we want to stabilize soon (1.76 will be earliest I think), we should put the enum variant changes done by Support enum variants in offset_of! #114208 into a separate feature gate as it hasn't had enough time to bake (and is a non-minor extension to offset_of)

+1 to this. Does someone want to provide mentoring instructions or put up a PR? I'd be happy to put up a PR if someone can give me a rough sketch of what needs to be done.

GKFX commented 1 year ago
  • if we want to stabilize soon (1.76 will be earliest I think), we should put the enum variant changes done by Support enum variants in offset_of! #114208 into a separate feature gate as it hasn't had enough time to bake (and is a non-minor extension to offset_of)

+1 to this. Does someone want to provide mentoring instructions or put up a PR? I'd be happy to put up a PR if someone can give me a rough sketch of what needs to be done.

I've opened the relevant PR at #117537.

joshlf commented 1 year ago
  • some conclusion of the syntax discussion, as pointed out above by @deltragon. see this link. further up in the thread it was said that the syntax is uncontroversial but that was before the opening of that zulip thread. I'm fine with the current syntax but note that we are currently doing some tricks to allow stuff like 0.0 (they are seen as floats)

From a very brief skim of that thread, it seems like most of the complexity in the design space has to do with the syntaxes for enum variants and array indexing. IIUC, considering only struct fields (which is what's being proposed for stabilization given https://github.com/rust-lang/rust/pull/117537), the only question is about how to represent a sequence of nested field accesses. The current syntax - offset_of!(Foo, bar.baz) - is consistent with @llogiq's point about consistency with existing syntax, and IMO is the right call. What would we need to do to reach a formal agreement to keep this syntax, keeping in mind that we're not committing to a syntax for array indexing or enum variant access?

llogiq commented 1 year ago

Does getting the offset of triply nested indexed fields work with this syntax, e.g. offset_of!(Foo, 1.2.3)? Because IIRC there was some hack around double indexed fields being parsed as floats.

My proposal was type,.followed by a comma separated list of field or variant::field, but I'm totally OK with the current syntax, provided it works well for all expected cases.

joshlf commented 1 year ago

Does getting the offset of triply nested indexed fields work with this syntax, e.g. offset_of!(Foo, 1.2.3)? Because IIRC there was some hack around double indexed fields being parsed as floats.

It appears to:

println!("{}", mem::offset_of!(D, c.b.a));

My proposal was type,.followed by a comma separated list of field or variant::field, but I'm totally OK with the current syntax, provided it works well for all expected cases.

Not sure I can picture what you're describing; can you give an example of this syntax?

llogiq commented 1 year ago

c.b.a doesn't parse the same as 1.2.3 though. I was thinking of:

struct Foo(Outer);
struct Outer(Inner);
struct Inner(u8);

assert_eq!(0, core::mem::offset_of!(Foo, 0.0.0));

In my proposed syntax that expression would be offset_of!(Foo, 0, 0, 0), which has an obvious parse.

est31 commented 1 year ago

Does getting the offset of triply nested indexed fields work with this syntax, e.g. offset_of!(Foo, 1.2.3)? Because IIRC there was some hack around double indexed fields being parsed as floats.

The parser already has these workarounds, since #112216. See this test file for working examples.

Edit: the macro does not support some spacing combinations, see the places where the linked test uses builtin # offset_of instead of offset_of.

llogiq commented 1 year ago

That's what I thought, and this is the exact reason I proposed the comma-separated syntax. It's even parseable with standard matchers, and is completely immune to whitespace changes.

GKFX commented 1 year ago

The dotted syntax has the advantage of looking more like field access, and of being near-identical to C for the sake of familiarity. The parsing definitely has to work reliably with whitespace - there is a comment on #112216 suggesting to define the macro to take ($Container:ty, $($fields:tt)+) and add error handling into the macro which sounds reasonable.

joshlf commented 1 year ago

+1 to the syntax being the same as native field access to keep the mental model simple. IMO we should prioritize consistency of mental model for casual users who aren't thinking in terms of the syntax that you'd write using macro_rules!, and instead are just thinking in terms of it looking the same as they'd "normally" access a field. It's also consistent with addr_of!, which does a similar thing to offset_of!.

the8472 commented 1 year ago

When I think of manually doing pointer arithmetic for field accesses I do not think of "casual users" or people who would have their mental model upset by an extra comma in the syntax or anything like that.

joshlf commented 1 year ago

Fair point. I'm on my phone rn and can't check, but I wonder whether the existing syntax might cause problems for users who want to write macros which will call offset_of! and pass metavariables as arguments?

llogiq commented 1 year ago

Yes, it appears that one cannot pass expr-matched indices into a chain of dot-separated field accessors. That's an even stronger argument against the "natural" syntax. I also find the argument of consistency with addr_of! questionable. With addr_of!, the field accessor syntax makes sense, because we have an object to access. Much unlike offset_of! where we operate on the type, not any particular value.

GKFX commented 1 year ago

I think a correct set of macro arguments for the C-style syntax is ($a:ty, $($b:expr)+ $(,)?) (https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=e3fe4caa1b5bb7a9204919a99275cbe4). It accepts all the tuple accesses in offset-of-tuple-nested.rs, and unlike $($fields:tt)+ it does allow for following arguments, which I think addresses @joshlf's point. That would require a re-write of the parser for builtin # offset_of because as noted above it doesn't currently accept expressions in the second argument.

llogiq commented 1 year ago

I want to add that I do like the current syntax from a user's point of view. On the other hand, I worry that the hacks needed to implement it will come back to bite us.

deltragon commented 1 year ago

I would argue for just stablizing the MVP of offset_of!(MyStruct, single_field) for now, and punting everything else (nested fields, array indexing, enums) to afterwards. The MVP would already accomplish a lot and remove the need for a lot of hacks in the ecosystem. It might even be possible to implement nested fields support as a macro in a library on top of the MVP offset_of!, which would give the ecosystem the option of experimenting with the syntax, and deciding whether extending the builtin one is even necessary.

est31 commented 1 year ago

I would argue for just stablizing the MVP of offset_of!(MyStruct, single_field) for now

@deltragon such an MVP is a great idea. I think if others think similarly, we could split up the feature further, and stabilize the MVP very soon (likely going to be in 1.76, so one release after the christmas/async-in-traits one), and leave the syntax for multiple fields open for now.

IIRC array indexing is also not even supported yet by offset_of!.

tgross35 commented 1 year ago

+1 for a stabilization without nested fields or enum variants. I don’t see any use of anything fancier from a rough search anyway https://grep.app/search?q=offset_of%21%5C%28.%2A%3F%2C.%2A%3F%5B%5C.%3A%5D.%2A%3F%5C%29&regexp=true, probably because many uses of offset_of are related to sharing things with C.

I’m not sure what the current support level is, but I think it makes sense to include tuples and tuple structs as part of the stabilization. That is, I would expect this to work:

struct Foo { a: i32, b: String }
struct Bar(i32, String);
type Baz = (i32, String);

let _ = offset_of!(Foo, b);
let _ = offset_of!(Bar, 1);
let _ = offset_of!(Baz, 1);
let _ = offset_of!((i32, String), 1);

But not support nesting (which can be worked around with multiple offset_of!s anyway) or enums (the big syntax discussion point, but probably a very small use case).

I think somebody interested could put a PR up to do this.

The remaining syntax might be best in a proposal about how type-level references in Rust should look, since that is useful beyond the scope of this macro (metaprogramming, reflection)