oxc-project / oxc

⚓ A collection of JavaScript tools written in Rust.
https://oxc.rs
MIT License
10.38k stars 382 forks source link

Make AST `#[repr(C)]` #4296

Closed overlookmotel closed 3 weeks ago

overlookmotel commented 2 months ago

I am in favour of making all AST types #[repr(C)].

The effect of #[repr(C)] is to make the memory layout of types predictable.

Why?

1: Treat Rust types as a serialization format

(copied from https://github.com/oxc-project/oxc/pull/3115)

Oxc's direction appears to be increasingly to build up control over the fundamental primitives we use, in order to unlock performance and features. We have our own allocator, our own custom implementations for Box and Vec, our own IndexVec. The AST is the central building block of Oxc, and taking control of its memory layout feels like a step in this same direction.

Oxc has a major advantage over other similar libraries in that it keeps all the AST data in an arena. This opens the door to treating the AST either as Rust types or as pure data (just bytes). That data can be moved around and manipulated beyond what Rust natively allows.

However, to enable that, the types need to be well-specified, with completely stable layouts. #[repr(C)] is the only tool Rust provides to do this.

Once the types are #[repr(C)], various features become possible:

Allowing the AST to be treated as pure data will likely unlock other "next level" features further down the track (caching for "edge bundling" comes to mind).

2: AST transfer

This isn't necessary for AST transfer, but it's desirable, as then we can get rid of a bunch of code in current implementation which has to do introspection of the type layouts at runtime. If types are #[repr(C)], it can be done in build script instead.

3: Memory access optimizations

Control of in-memory type layouts can also allow some runtime optimizations.

For example, if all span field was always placed at the same offset in every struct, then GetSpan::span could be reduced to a single CPU instruction with no branching.

The problem with #[repr(C)]

#[repr(C)] does have the annoying requirement that you have to order the fields of structs in the exact order that they'll be laid out in memory. Without #[repr(C)], Rust automatically re-orders the fields to minimize padding (broadly speaking, order fields in descending order of type alignment). But with #[repr(C)] you have to do this manually.

e.g. BinaryExpression needs to be defined as:

#[repr(C)]
pub struct BinaryExpression<'a> {
    pub left: Expression<'a>,
    pub right: Expression<'a>,
    pub span: Span,
    pub operator: BinaryOperator,
}

This is not ideal. We want field order as written in *.rs files to be:

  1. The order of visitation.
  2. The order that makes sense semantically - memory layout should be an invisible implementation detail.

Proposed solutions

#[repr_stable(field_order(left, right, span, operator))]
pub struct BinaryExpression<'a> {
    pub span: Span,
    pub left: Expression<'a>,
    pub operator: BinaryOperator,
    pub right: Expression<'a>,
}

repr_stable is a proc macro which re-orders the fields in the memory layout. But the field order as written can be the same as it is now - a semantically sensible order.

Alternatively: Rather than specifying memory field order in the #[repr_stable] attribute, codegen (https://github.com/oxc-project/oxc/pull/3815) can figure out what memory layout should be and pass this info to the macro via "side channel" of the AST schema. That way, all types will be optimized for ideal memory layout without any manual work.

That also opens the door to doing most of the work in the codegen, and making the proc macro "dumb", so minimizing its compile-time impact. Pushing this to its furthest extent, something like "Option 2d: Cache macro expansion output" in oxc-project/oxc#4297 would make the macro basically free (almost 0 compile time cost).

rzvxa commented 2 months ago

Allowing the AST to be treated as pure data will likely unlock other "next level" features further down the track (caching for "edge bundling" comes to mind).

Just a note to myself and others, If we want to go down the binary serialization route, Make sure to add a static versioning to types so we can iterate on it and assert layout compatibility across versions(when the layout changes).

overlookmotel commented 1 month ago

Am looking to implement this over next few weeks.

Proposed approach

  1. Calculate ideal memory layout in AST codegen.
  2. Codegen record that in a schema.
  3. Turn the #[ast] attr into a real macro, which alters the type definition, based on the schema.

#[ast] macro will:

Result

Result will be:

#[ast(visit)]
pub struct BinaryExpression<'a> {
    pub span: Span,
    pub left: Expression<'a>,
    pub operator: BinaryOperator,
    pub right: Expression<'a>,
}
#[ast(visit)]
pub enum Expression<'a> {
    BooleanLiteral(Box<'a, BooleanLiteral>),
    NullLiteral(Box<'a, NullLiteral>),
    /* ...all the other variants... */

    #[copy_variants_from] MemberExpression
}

i.e.:

  1. Structs look exactly as they are now.
  2. Enum variants don't have explicit discriminants (as before we introduced inherited enum variants).
  3. inherit_variants! macro is gone, solving oxc-project/oxc#4297.

Hiding implementation details

Type memory layouts are treated as an internal implementation detail which is invisible to the user.

I think this is a legitimate approach, as all this change does is manually replicate what Rust compiler does automatically, but adds one important property - a guarantee of memory layout stability.

If we want to increase the visibility of what's happening, we could also add a marker attr #[repr_stable] to all types:

#[ast(visit)]
#[repr_stable]
pub struct BinaryExpression<'a> { /* ... */ }

Downside

The only downside I can see is that we'll need to disable clippy's inconsistent_struct_constructor lint rule - because after #[ast] macro has done its work, fields in struct definitions will be a different order from what appears in the type definitions as written.

Reducing compile-time impact of macro

@rzvxa and I have discussed various approaches to pre-compiling macros to make expansion as fast as possible. The most advanced of these would reduce compile time impact to near zero.

Initially, I propose that we don't try to be too clever and just get the feature working, which will have some compile time impact. We'll measure how bad that effect is, and then we'll iterate to bring it back down. How bad the effect is will inform how much effort is worthwhile putting in to optimizing it (i.e. only go down the rabbit hole if it'll yield real benefit).

overlookmotel commented 1 month ago

@Boshen and I discussed on video call. He's given his 👍 to this. He agreed that first priority on the macro is to get it working, and not worry about compile time initially - iterate later to improve it if necessary.

rzvxa commented 1 month ago

@overlookmotel May I self-assign this issue?

overlookmotel commented 1 month ago

@rzvxa Would be great if you wanted to work on this. Let's have a chat next week before we really dive in though.

overlookmotel commented 3 weeks ago

@rzvxa have been discussing on Discord DMs for past couple of weeks, and rzvxa has been making a series of PRs laying the groundwork. Just to bring everyone else into the loop of where we're up to:

Where we're at

The original plan was to have the AST codegen figure out ideal field order for structs, so they are tightly packed in memory, exactly the same way Rust compiler does by default (without #[repr(C)]). The #[ast] proc macro then performs that field ordering.

So then the change to #[repr(C)] has 0 perf impact - we're just manually doing something that compiler automatically does already.

rzvxa now has that working in #4404.

This fulfills all our objectives:

  1. Make all types have guaranteed stable and predictable memory layouts.
  2. The type defs are still written in an order that makes sense to humans (and with fields in visitation order).
  3. Zero perf impact.

The problem

The problem we have hit is Rust Analyser.

RA "sees behind the curtain" and shows the user the field order after the macro has done field re-ordering.

program_layout_ra

rzvxa and I both agree that this is unacceptable - it's confusing for users.

How to work around it

I can see 4 options:

1. Don't re-order fields

Just accept that type layouts in memory become more "baggy", due to excess padding. It causes a small perf hit (see #4614), but it's not too bad.

2. Do some manual re-ordering of fields

Alter field orders in type defs by hand to make them tightly packed by default.

For most types, this doesn't need many changes. For example, Program only needs source_type field moved down:

pub struct Program<'a> {
    pub span: Span,
    // `source_type` was here
    pub hashbang: Option<Hashbang<'a>>,
    pub directives: Vec<'a, Directive<'a>>,
    pub body: Vec<'a, Statement<'a>>,
    // Now it's here
    pub source_type: SourceType,
    pub scope_id: Cell<Option<ScopeId>>,
}

I don't think this would be too intrusive. Mostly the fields which cause excess padding are bools or fieldless enums (effectively u8s). These types aren't visited, so they can be moved to be last fields without altering visitation order.

The disadvantage is that we now need to think about memory layouts when we set/change the order of fields in AST types, rather than being free to order them as makes sense to us, and leave it to codegen to optimize it as an invisible implementation detail (the way Rust's compiler does for us now).

3. Hybrid

For the few types that fall into the 2nd category, Rust Analyser will show a field order which is different from how it's written in original type defs in .rs files. But these cases will be few, and it's only showing fields in a different order, not hiding fields or showing fields that don't exist - so maybe it's acceptable.

4. Fix Rust Analyser's output

Find a way to get Rust Analyser to "see" the type defs as written, rather than after field re-ordering. This is the "have cake and eat it" option. We can fulfil all our objectives, and also preserve optimal DX.

But it is not clear yet whether this is possible or not. https://github.com/rust-lang/rust-analyzer/issues/17766

What do we do?

In my opinion:

Obviously option (4) is the ideal, but may take a while to resolve, and we want to move forwards with this in meantime.

(1) does have some perf hit. It's not too bad, but best avoided if we can.

I suggest we see how close we can get to 0 perf hit using approach (2). Maybe that'll be enough, but otherwise we can consider (3).

overlookmotel commented 3 weeks ago

I just realised a potential fix for RA problem: Only apply #[repr(C)] and re-order fields in release builds (which is where it matters):

#[ast] macro would output for BinaryExpression:

// Original field order
#[cfg(debug_assertions)]
pub struct BinaryExpression<'a> {
    pub span: Span,
    pub left: Expression<'a>,
    pub operator: BinaryOperator,
    pub right: Expression<'a>,
}
// With re-ordered fields
#[cfg(not(debug_assertions))]
#[repr(C)]
pub struct BinaryExpression<'a> {
    pub left: Expression<'a>,
    pub right: Expression<'a>,
    pub span: Span,
    pub operator: BinaryOperator,
}

If debug_assertions isn't an exact enough proxy for release/dev mode, we can more accurately detect OPT_LEVEL in a build script:

// build.rs
fn main() {
    // TODO: Change to `cargo::` once MSRV is >= 1.77.0
    println!("cargo:rustc-check-cfg=cfg(stable_repr)");
    if std::env::var("OPT_LEVEL").unwrap() != "0" {
        println!("cargo:rustc-cfg=stable_repr");
    }
}
#[cfg(not(repr_stable))]
pub struct BinaryExpression<'a> { /* fields in original order */ }
#[cfg(repr_stable)]
#[repr(C)]
pub struct BinaryExpression<'a> { /* re-ordered fields */ }

I've tested this and it does seem to work - RA shows original field order, and we get the optimized memory layout in release builds. @rzvxa Do you see any downside to this?

rzvxa commented 3 weeks ago

I just realised a potential fix for RA problem: Only apply #[repr(C)] and re-order fields in release builds

This seems promising. The only issue I can see is when we do something with layouts in debug builds we need to use unordered layout data. So we need 4 layout data, 32bit-original, 64bit-original, 32bit-optimized, and 64bit-optimized; Then we need to switch between them depending on the build.

I think our rust code needs to export a constant that would let the users of pre-compiled binaries know this information about the build they are operating on. We can even publish a crate called oxc_ast_schema that would export the actual memory layouts(+ types to work with the data).

So oxc_ast would export this: oxc_ast::ABI_LAYOUT which is an enum. We can use this enum to get the layout data of the current build like so: oxc_ast_schema::layouts(oxc_ast::ABI_LAYOUT).

But I think it would work if we go at it. Doesn't have to block us from doing AST Transfer. We can come back to it after the fact and make reordering an optimization step.

overlookmotel commented 3 weeks ago

rzvxa and I discussed. We think the solution mentioned in https://github.com/oxc-project/oxc/issues/4296#issuecomment-2266641618 can work.

4614 is now merged, and we'll use that technique to win back the small perf loss we just took.

In debug builds, in my opinion we should not re-order struct fields, nor make structs #[repr(C)] by default. If they were #[repr(C)], Rust Analyser would show the wrong size for types in its tooltip display.

This is a slightly better build script to use:

// build.rs
fn main() {
    // TODO: Change to `cargo::` once MSRV is >= 1.77.0
    println!("cargo:rustc-check-cfg=cfg(stable_repr)");
    // Note addition of `cfg!(feature = "stable_repr")`
    if cfg!(feature = "stable_repr") || std::env::var("OPT_LEVEL").unwrap() != "0" {
        println!("cargo:rustc-cfg=stable_repr");
    }
}

With that, you can use stable_repr feature to enable #[repr(C)] + re-ordered fields in debug builds, if you need it. And in release builds, stable_repr always gets enabled automatically.

At present, we have no use case which requires static knowledge of memory layouts outside of Rust apart from AST transfer. NAPI build will have to enable stable_repr to support AST transfer. But, that aside, any Rust code which wants to know about memory layout of a type can use std::mem::offset_of!, and that will work regardless of whether debug or release mode, and regardless of whether stable_repr is enabled or not.

I believe therefore we only need to maintain 2 versions of memory layout in schema - 64 bit and 32 bit.

We may want to make tests run on both debug and release builds, to cover all bases (and disable the field offset assertions unless stable_repr is enabled).

rzvxa commented 3 weeks ago

There is #4382 to track the layout optimization through field reordering, Shall we close this one?

overlookmotel commented 3 weeks ago

Yes, the AST is now #[repr(C)]!

overlookmotel commented 3 weeks ago

Ah no actually. We're not quite there. We still need to:

rzvxa commented 3 weeks ago

@overlookmotel May I suggest we make those into their own separate tasks/issues.

overlookmotel commented 3 weeks ago

Done. #4622, #4623.