Open Kimundi opened 10 months ago
Note that adding #[no_mangle]
to check1
to force it to appear in the binary does cause it to appear as a constant. I think this might be a case of "in isolation, codegen is weird, you have to evaluate it at a whole program level"?
This looks like another instance of #116505, meaning that depending on some heuristics leaf functions are only codegenned in cgu that need them. Since check1 and check2 are not called, they are not seen. This is good for optimizations but quite unfortunate for inspecting asm on Compiler Explorer.
Adding #[no_mangle]
will force them to be available, you can also add -C link-dead-code=yes
, or turn of the optimization with -Z cross-crate-inline-threshold=never
on nightly.
I think there are multiple things happening here. I'll try to explain what I'm seeing.
I expected to see this happen: Both
check1
andcheck2
should compile to functions that just return a compiletime-evaluated struct value
That's not possible, those functions both take runtime parameters. I'm presuming you mean "use a compiletime-evaluated struct value" instead?
check1
disapeared from the assembly
This is the cross-crate-inlining effect; I've been mulling over ideas to improve the situation, but this is not a bug. It's very unfortunate though. https://github.com/compiler-explorer/compiler-explorer/issues/5782
and it looks like all the code needed to runtime-evaluate it got added instead... The weird thing is that the function that regressed is the one that uses a
const
item by name, rather than the one that just calls a const function inside a non-const function body.
This I think is a bug. I've made a smaller example here: https://godbolt.org/z/GoEa8jc5f and I'm going to look into it...
The reproducer is this:
pub struct Thing;
impl Thing {
const fn new(panic: bool) -> Self {
if panic {
panic!()
}
Thing
}
}
const T: Thing = Thing::new(false);
pub fn oof() -> Thing {
T
}
Since automatic cross-crate-inlining landed, the assembly generated for this crate contains Thing::new
and a bunch of symbols related to panics. Which looks very goofy. At first I was afraid that I broke something in that PR (https://github.com/rust-lang/rust/pull/116505) related to reachability, but I'm now quite sure I didn't. On an old compiler, if you just add #[inline]
to fn oof
, you'll get the goofy assembly in godbolt: https://godbolt.org/z/q4E6s4jvs
The change here is that fn oof
is cross-crate-inlinable, which means that reachability analysis must consider the contents of its body to be reachable outside this CGU. This is a dark corner of the whole "automatic #[inline]" or "cross-crate-inlinable" approach that I thought I had dodged in the original PR by only inferring cross-crate-inlinability for leaf functions. Turns out I didn't. In this program, fn oof
is a leaf function in MIR because it looks like this:
fn oof() -> Thing {
let mut _0: Thing;
bb0: {
_0 = const _;
return;
}
}
But to reachability analysis, this function has now made the initializer of that const
reachable. Ergo, to reachability analysis this is not a leaf function.
So cross-crate-inlining turns the surface area of this crate from "a symbol called oof
that I can call and nothing else" to "a symbol called oof
and also its body and everything reachable from it". So the compiler dutifully recurses through the const fn
that initializes this const and instantiates everything that it reaches.
So from one perspective, this is sensible. But from the OP's, this is obviously silly. All this extra panic code was made reachable from compile time evaluation, but we stamped out runtime code for it. So I wonder if that's what we should be tracking here; if reachability determines something is reachable but only at compile time.
I expected to see this happen: Both
check1
andcheck2
should compile to functions that just return a compiletime-evaluated struct valueThat's not possible, those functions both take runtime parameters. I'm presuming you mean "use a compiletime-evaluated struct value" instead?
Err yes, that was badly formulated on my part. Fixing it in the description.
check1
disapeared from the assemblyThis is the cross-crate-inlining effect; I've been mulling over ideas to improve the situation, but this is not a bug. It's very unfortunate though. compiler-explorer/compiler-explorer#5782
That fully explains it, I was not aware that we actually started cross-crate inlining for non-marked functions now. This bugreport was created based on the assumption that that does not happen for public library functions.
and it looks like all the code needed to runtime-evaluate it got added instead... The weird thing is that the function that regressed is the one that uses a
const
item by name, rather than the one that just calls a const function inside a non-const function body.This I think is a bug. I've made a smaller example here: https://godbolt.org/z/GoEa8jc5f and I'm going to look into it...
Thanks! Good to know I discovered something unexpected at least π
So I wonder if that's what we should be tracking here; if reachability determines something is reachable but only at compile time.
That seems like what this ought to boil down to, yeah
So I wonder if that's what we should be tracking here; if reachability determines something is reachable but only at compile time.
Interesting, I guess currently this is entirely per-item and does not consider "how the item is used"? Since we distinguish compile-time-MIR and runtime MIR, it would indeed make sense to treat those two rather separately for reachability.
Cc @oli-obk
I started working on this and made some progress, but then was immediately tripped up by the difference between a const
which captures a function pointer and a const
which calls a function. That's probably resolvable, but I have enough other things on my plate right now that I'm unassigning myself to encourage anyone else interested to work on this.
Having recently looked into the collector a bit, I have no idea how this is happening. We're not recursing into the MIR body of a const, so I can't see how the body of T
can affect which things get collected.
Maybe some other pass is adding junk to oof
's required_const
?
There are these calls in reachable
to visit_nested_body
which are suspicious but I don't know if that affects mono item collection at all:
Does anyone know anyone who understands our mono item collector?^^
Per my comment above, I believe this is required to handle const
s that capture function pointers. Those function pointers will be used at runtime, so they're runtime code reachable through a const. That pattern is used by the standard library's thread-local system, which is a common source of linker errors when I work on MirUsedCollector.
That doesn't sound right. Function pointers are handled by walking the final value of the const, and if we see a function pointer in there we add it to the monomorphization list. (Pointers are special values -- they have provenance -- so we can detect them in the final value of a const pretty easily; this does not require a type-driven traversal.) There's the collect_alloc
function for that. So this does not require looking at the MIR of the const
either.
Function pointers are handled by walking the final value of the const, and if we see a function pointer in there we add it to the monomorphization list.
since constants are reevaluated in every crate (unlike statics since recently), reachability needs to mark everything in its body as reachable. So I guess we should start to cache at least constants that can be evaluated (non-generic or where being generic doesn't matter), and then just not encode their MIR. Then reachability can skip these bodies
Okay but does reachability even interact with monomorphization? So far I don't see how reachability could be responsible here.
Or do we actually codegen all functions that are "reachable"? In that case we probably have to distinguish "reachable in const code" and "reachable in runtime code".
Or do we actually codegen all functions that are "reachable"? In that case we probably have to distinguish "reachable in const code" and "reachable in runtime code".
Yes. These are the "seed" set of items to be processed in monomorphization
Can you point to the code for that? I looked for where he reachable_set
query is called and can't see anything inside the monomorphization crate.
I was mistaken. While I recently dug into this, I mixed up two things.
It is used separately (and I think we hope that monomorphization and reachability are in sync enough that we never end up with missing MIR or missing codegen items). It is also used for splitting the mono items into codegen units, not for actually coming up with the initial set.
Okay, so if I understand correctly we still don't have an explanation for why these even get added to the monomorphization set. More research is needed...
The reproducer is this:
FWIW this doesn't reproduce on latest nightly any more, the #[inline]
has to be added again (making current nightly behave like old versions): https://godbolt.org/z/4vxsT4cT3
Turns out reachability is involved, I think. Just very indirectly.
The key difference in the log of the good behavior (without #[inline]
) vs the bad behavior (with #[inline]
) is
β βββrustc_monomorphize::collector::push_if_root def_id=DefId(0:6 ~ collect[a406]::{impl#0}::new)
β β ββ 0ms DEBUG rustc_monomorphize::collector found root
β β βββrustc_monomorphize::collector::create_fn_mono_item instance=Instance { def: Item(DefId(0:6 ~ collect[a406]::{impl#0}::new)), args: [] }, source=no-location (#0)
β β β ββ 0ms DEBUG rustc_monomorphize::collector return=Spanned { node: Fn(Instance { def: Item(DefId(0:6 ~ collect[a406]::{impl#0}::new)), args: [] }), span: no-location (#0) }
β β βββ
β βββ
IOW, this function suddenly considers Thing::new
to be a root:
So is_reachable_non_generic
switches from false
to true
. That boils down to a membership test on the set computed here, which is created by starting with the reachable_set
computed here and then removing some stuff. reachable_set
in turn recurses into inline functions here and into the MIR body of constants here.
So... I guess the question is, is reachable_set
supposed to return only things that are runtime-reachable (in which case it is currently returning too much), or is it supposed to return everything that reachable in any fashion (in which case the collector using this as its criterion is wrong since there we only want to collect runtime-relevant things).
So from one perspective, this is sensible. But from the OP's, this is obviously silly. All this extra panic code was made reachable from compile time evaluation, but we stamped out runtime code for it. So I wonder if that's what we should be tracking here; if reachability determines something is reachable but only at compile time.
So, looking at the comments I wrote in https://github.com/rust-lang/rust/pull/122769...
I think what happens is that since T
is reachable (which is correct), it may be the case that T
returns a function pointer to any function it mentions, recursively. So just in case T
returns a function pointer to Thing::new
, we mark Thing::new
reachable.
Now, we could probably skip this for top-level const items, since we know we can always evaluate them to a value, and then we can just check the final value for any function pointers and only codegen those. (And the collector already has logic to find these function pointers.)
But for associated consts, this is a lot harder. I think we'd have to keep track (inside reachable
computation) of whether the current item is const-reachable or full-reachable, and then if we are const-reachable that does not in general get added to the regular reachable set, until we find a function item mentioned in a way that it does not immediately get called -- those are then added as regular-reachable again. Then there are still examples where we'd generate runtime code for no reason but it'd be a lot less common. Fundamentally we can't fully avoid false positives here as we can't be sure which functions may be referenced via function pointers that actually leak into the final value of the const (and thus potentially into runtime code)
we'd have to keep track (inside reachable computation) of whether the current item is const-reachable or full-reachable, and then if we are const-reachable that does not in general get added to the regular reachable set, until we find a function item mentioned in a way that it does not immediately get called -- those are then added as regular-reachable again.
Ah! I think this might explain why my attempt to implement const-reachable didn't work.
Do you still have that branch somewhere? What I imagine would be completely internal to reachable.rs. At least according to the comments there, we encode ctfe MIR for everything anyway, so it's only runtime reachability we have to worry about.
But, surprisingly, a const item can make things runtime reachable.
When a runtime function calls a private const fn
, we don't have to consider that function to be const-reachable. So the naive approach outlined above could overestimate how much is const-reachable. The two reachable sets do interact though... I think it's a fixed point of a computation that goes like
const fn
and const
are const-reachable.Do you still have that branch somewhere?
Yes, though I expect there isn't anything in it worth salvaging: https://github.com/rust-lang/rust/compare/master...saethlin:rust:const-reachability
Okay so that computed two sets as result from the query. I don't know if that's necessary.
Regarding the analysis outlined above, I assume it is pretty rare that a private function is const
for no reason, so it's probably okay to simplify this a bit:
On nightly, godbolt doesn't show any assembly any more with the original reproducer, thanks to https://github.com/rust-lang/rust/pull/122505.
However, this variant still generates assembly:
pub struct Thing;
impl Thing {
const fn new(panic: bool) -> Self {
if panic {
panic!()
}
Thing
}
const C: Thing = Thing::new(false);
}
#[inline]
pub fn oof() -> Thing {
Thing::C
}
Code
I tried this code:
I expected to see this happen: Both
check1
andcheck2
should compile to functions that just use a compiletime-evaluated struct valueInstead, this happened:
check1
disapeared from the assembly, and it looks like all the code needed to runtime-evaluate it got added insteadThe weird thing is that the function that regressed is the one that uses a
const
item by name, rather than the one that just calls a const function inside a non-const function body.Version it worked on
It most recently worked on: 1.74.1 (stable as of writing this)
Version with regression
from playground: