rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.38k stars 12.72k forks source link

Runtime code is generated for functions only used in const eval #119214

Open Kimundi opened 10 months ago

Kimundi commented 10 months ago

Code

I tried this code:

I expected to see this happen: Both check1 and check2 should compile to functions that just use a compiletime-evaluated struct value

Instead, this happened: check1 disapeared from the assembly, and it looks like all the code needed to runtime-evaluate it got added instead

The weird thing is that the function that regressed is the one that uses a const item by name, rather than the one that just calls a const function inside a non-const function body.

Version it worked on

It most recently worked on: 1.74.1 (stable as of writing this)

Version with regression

from playground:

1.76.0-beta.1 (2023-12-21 0e09125c6c3c2fd70d7d)
1.77.0-nightly (2023-12-21 3d0e6bed600c0175628e)
asquared31415 commented 10 months ago

Note that adding #[no_mangle] to check1 to force it to appear in the binary does cause it to appear as a constant. I think this might be a case of "in isolation, codegen is weird, you have to evaluate it at a whole program level"?

RossSmyth commented 10 months ago

This looks like another instance of #116505, meaning that depending on some heuristics leaf functions are only codegenned in cgu that need them. Since check1 and check2 are not called, they are not seen. This is good for optimizations but quite unfortunate for inspecting asm on Compiler Explorer.

Adding #[no_mangle] will force them to be available, you can also add -C link-dead-code=yes, or turn of the optimization with -Z cross-crate-inline-threshold=never on nightly.

https://godbolt.org/z/4rfGGrbWx

saethlin commented 10 months ago

I think there are multiple things happening here. I'll try to explain what I'm seeing.

I expected to see this happen: Both check1 and check2 should compile to functions that just return a compiletime-evaluated struct value

That's not possible, those functions both take runtime parameters. I'm presuming you mean "use a compiletime-evaluated struct value" instead?

check1 disapeared from the assembly

This is the cross-crate-inlining effect; I've been mulling over ideas to improve the situation, but this is not a bug. It's very unfortunate though. https://github.com/compiler-explorer/compiler-explorer/issues/5782

and it looks like all the code needed to runtime-evaluate it got added instead... The weird thing is that the function that regressed is the one that uses a const item by name, rather than the one that just calls a const function inside a non-const function body.

This I think is a bug. I've made a smaller example here: https://godbolt.org/z/GoEa8jc5f and I'm going to look into it...

saethlin commented 10 months ago

The reproducer is this:

pub struct Thing;

impl Thing {
    const fn new(panic: bool) -> Self {
        if panic {
            panic!()
        }
        Thing
    }
}

const T: Thing = Thing::new(false);

pub fn oof() -> Thing {
    T
}

Since automatic cross-crate-inlining landed, the assembly generated for this crate contains Thing::new and a bunch of symbols related to panics. Which looks very goofy. At first I was afraid that I broke something in that PR (https://github.com/rust-lang/rust/pull/116505) related to reachability, but I'm now quite sure I didn't. On an old compiler, if you just add #[inline] to fn oof, you'll get the goofy assembly in godbolt: https://godbolt.org/z/q4E6s4jvs

The change here is that fn oof is cross-crate-inlinable, which means that reachability analysis must consider the contents of its body to be reachable outside this CGU. This is a dark corner of the whole "automatic #[inline]" or "cross-crate-inlinable" approach that I thought I had dodged in the original PR by only inferring cross-crate-inlinability for leaf functions. Turns out I didn't. In this program, fn oof is a leaf function in MIR because it looks like this:

fn oof() -> Thing {
    let mut _0: Thing;

    bb0: {
        _0 = const _;
        return;
    }
}

But to reachability analysis, this function has now made the initializer of that const reachable. Ergo, to reachability analysis this is not a leaf function.

So cross-crate-inlining turns the surface area of this crate from "a symbol called oof that I can call and nothing else" to "a symbol called oof and also its body and everything reachable from it". So the compiler dutifully recurses through the const fn that initializes this const and instantiates everything that it reaches.

So from one perspective, this is sensible. But from the OP's, this is obviously silly. All this extra panic code was made reachable from compile time evaluation, but we stamped out runtime code for it. So I wonder if that's what we should be tracking here; if reachability determines something is reachable but only at compile time.

Kimundi commented 10 months ago

I expected to see this happen: Both check1 and check2 should compile to functions that just return a compiletime-evaluated struct value

That's not possible, those functions both take runtime parameters. I'm presuming you mean "use a compiletime-evaluated struct value" instead?

Err yes, that was badly formulated on my part. Fixing it in the description.

check1 disapeared from the assembly

This is the cross-crate-inlining effect; I've been mulling over ideas to improve the situation, but this is not a bug. It's very unfortunate though. compiler-explorer/compiler-explorer#5782

That fully explains it, I was not aware that we actually started cross-crate inlining for non-marked functions now. This bugreport was created based on the assumption that that does not happen for public library functions.

and it looks like all the code needed to runtime-evaluate it got added instead... The weird thing is that the function that regressed is the one that uses a const item by name, rather than the one that just calls a const function inside a non-const function body.

This I think is a bug. I've made a smaller example here: https://godbolt.org/z/GoEa8jc5f and I'm going to look into it...

Thanks! Good to know I discovered something unexpected at least πŸ˜„

Kimundi commented 10 months ago

So I wonder if that's what we should be tracking here; if reachability determines something is reachable but only at compile time.

That seems like what this ought to boil down to, yeah

RalfJung commented 10 months ago

So I wonder if that's what we should be tracking here; if reachability determines something is reachable but only at compile time.

Interesting, I guess currently this is entirely per-item and does not consider "how the item is used"? Since we distinguish compile-time-MIR and runtime MIR, it would indeed make sense to treat those two rather separately for reachability.

Cc @oli-obk

saethlin commented 10 months ago

I started working on this and made some progress, but then was immediately tripped up by the difference between a const which captures a function pointer and a const which calls a function. That's probably resolvable, but I have enough other things on my plate right now that I'm unassigning myself to encourage anyone else interested to work on this.

RalfJung commented 8 months ago

Having recently looked into the collector a bit, I have no idea how this is happening. We're not recursing into the MIR body of a const, so I can't see how the body of T can affect which things get collected.

Maybe some other pass is adding junk to oof's required_const?

RalfJung commented 8 months ago

There are these calls in reachable to visit_nested_body which are suspicious but I don't know if that affects mono item collection at all:

https://github.com/rust-lang/rust/blob/24f009c5e55d18c12563dd74681ca33b8a349936/compiler/rustc_passes/src/reachable.rs#L197-L202

Does anyone know anyone who understands our mono item collector?^^

saethlin commented 8 months ago

Per my comment above, I believe this is required to handle consts that capture function pointers. Those function pointers will be used at runtime, so they're runtime code reachable through a const. That pattern is used by the standard library's thread-local system, which is a common source of linker errors when I work on MirUsedCollector.

RalfJung commented 8 months ago

That doesn't sound right. Function pointers are handled by walking the final value of the const, and if we see a function pointer in there we add it to the monomorphization list. (Pointers are special values -- they have provenance -- so we can detect them in the final value of a const pretty easily; this does not require a type-driven traversal.) There's the collect_alloc function for that. So this does not require looking at the MIR of the const either.

oli-obk commented 8 months ago

Function pointers are handled by walking the final value of the const, and if we see a function pointer in there we add it to the monomorphization list.

since constants are reevaluated in every crate (unlike statics since recently), reachability needs to mark everything in its body as reachable. So I guess we should start to cache at least constants that can be evaluated (non-generic or where being generic doesn't matter), and then just not encode their MIR. Then reachability can skip these bodies

RalfJung commented 8 months ago

Okay but does reachability even interact with monomorphization? So far I don't see how reachability could be responsible here.

Or do we actually codegen all functions that are "reachable"? In that case we probably have to distinguish "reachable in const code" and "reachable in runtime code".

oli-obk commented 8 months ago

Or do we actually codegen all functions that are "reachable"? In that case we probably have to distinguish "reachable in const code" and "reachable in runtime code".

Yes. These are the "seed" set of items to be processed in monomorphization

RalfJung commented 8 months ago

Can you point to the code for that? I looked for where he reachable_set query is called and can't see anything inside the monomorphization crate.

oli-obk commented 8 months ago

I was mistaken. While I recently dug into this, I mixed up two things.

It is used separately (and I think we hope that monomorphization and reachability are in sync enough that we never end up with missing MIR or missing codegen items). It is also used for splitting the mono items into codegen units, not for actually coming up with the initial set.

RalfJung commented 8 months ago

Okay, so if I understand correctly we still don't have an explanation for why these even get added to the monomorphization set. More research is needed...

RalfJung commented 8 months ago

The reproducer is this:

FWIW this doesn't reproduce on latest nightly any more, the #[inline] has to be added again (making current nightly behave like old versions): https://godbolt.org/z/4vxsT4cT3

RalfJung commented 8 months ago

Turns out reachability is involved, I think. Just very indirectly.

The key difference in the log of the good behavior (without #[inline]) vs the bad behavior (with #[inline]) is

β”‚ β”œβ”€β”rustc_monomorphize::collector::push_if_root def_id=DefId(0:6 ~ collect[a406]::{impl#0}::new)
β”‚ β”‚ β”œβ”€  0ms DEBUG rustc_monomorphize::collector found root
β”‚ β”‚ β”œβ”€β”rustc_monomorphize::collector::create_fn_mono_item instance=Instance { def: Item(DefId(0:6 ~ collect[a406]::{impl#0}::new)), args: [] }, source=no-location (#0)
β”‚ β”‚ β”‚ β”œβ”€  0ms DEBUG rustc_monomorphize::collector return=Spanned { node: Fn(Instance { def: Item(DefId(0:6 ~ collect[a406]::{impl#0}::new)), args: [] }), span: no-location (#0) }
β”‚ β”‚ β”œβ”€β”˜
β”‚ β”œβ”€β”˜

IOW, this function suddenly considers Thing::new to be a root:

https://github.com/rust-lang/rust/blob/d3514a036dc65c1d31ee5b1b4bd58b9be1edf5af/compiler/rustc_monomorphize/src/collector.rs#L1293-L1307

So is_reachable_non_generic switches from false to true. That boils down to a membership test on the set computed here, which is created by starting with the reachable_set computed here and then removing some stuff. reachable_set in turn recurses into inline functions here and into the MIR body of constants here.

So... I guess the question is, is reachable_set supposed to return only things that are runtime-reachable (in which case it is currently returning too much), or is it supposed to return everything that reachable in any fashion (in which case the collector using this as its criterion is wrong since there we only want to collect runtime-relevant things).

RalfJung commented 6 months ago

So from one perspective, this is sensible. But from the OP's, this is obviously silly. All this extra panic code was made reachable from compile time evaluation, but we stamped out runtime code for it. So I wonder if that's what we should be tracking here; if reachability determines something is reachable but only at compile time.

So, looking at the comments I wrote in https://github.com/rust-lang/rust/pull/122769...

I think what happens is that since T is reachable (which is correct), it may be the case that T returns a function pointer to any function it mentions, recursively. So just in case T returns a function pointer to Thing::new, we mark Thing::new reachable.

Now, we could probably skip this for top-level const items, since we know we can always evaluate them to a value, and then we can just check the final value for any function pointers and only codegen those. (And the collector already has logic to find these function pointers.)

But for associated consts, this is a lot harder. I think we'd have to keep track (inside reachable computation) of whether the current item is const-reachable or full-reachable, and then if we are const-reachable that does not in general get added to the regular reachable set, until we find a function item mentioned in a way that it does not immediately get called -- those are then added as regular-reachable again. Then there are still examples where we'd generate runtime code for no reason but it'd be a lot less common. Fundamentally we can't fully avoid false positives here as we can't be sure which functions may be referenced via function pointers that actually leak into the final value of the const (and thus potentially into runtime code)

saethlin commented 6 months ago

we'd have to keep track (inside reachable computation) of whether the current item is const-reachable or full-reachable, and then if we are const-reachable that does not in general get added to the regular reachable set, until we find a function item mentioned in a way that it does not immediately get called -- those are then added as regular-reachable again.

Ah! I think this might explain why my attempt to implement const-reachable didn't work.

RalfJung commented 6 months ago

Do you still have that branch somewhere? What I imagine would be completely internal to reachable.rs. At least according to the comments there, we encode ctfe MIR for everything anyway, so it's only runtime reachability we have to worry about.

But, surprisingly, a const item can make things runtime reachable.

RalfJung commented 6 months ago

When a runtime function calls a private const fn, we don't have to consider that function to be const-reachable. So the naive approach outlined above could overestimate how much is const-reachable. The two reachable sets do interact though... I think it's a fixed point of a computation that goes like

saethlin commented 6 months ago

Do you still have that branch somewhere?

Yes, though I expect there isn't anything in it worth salvaging: https://github.com/rust-lang/rust/compare/master...saethlin:rust:const-reachability

RalfJung commented 6 months ago

Okay so that computed two sets as result from the query. I don't know if that's necessary.

Regarding the analysis outlined above, I assume it is pretty rare that a private function is const for no reason, so it's probably okay to simplify this a bit:

RalfJung commented 5 months ago

On nightly, godbolt doesn't show any assembly any more with the original reproducer, thanks to https://github.com/rust-lang/rust/pull/122505.

However, this variant still generates assembly:

pub struct Thing;

impl Thing {
    const fn new(panic: bool) -> Self {
        if panic {
            panic!()
        }
        Thing
    }

    const C: Thing = Thing::new(false);
}

#[inline]
pub fn oof() -> Thing {
    Thing::C
}