rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.09k stars 12.55k forks source link

Large types cause linker failure #130729

Open benwis opened 2 days ago

benwis commented 2 days ago

So I'm one of the maintainers of Leptos, and we're experimenting with a new static type system for our next release. The issue we're having is that for people porting existing websites/apps to the beta, experience a linker issue with link.exe, lld, mold, and ld. Our theory is that once the types reach a certain size, it will crash the linker.

In this version we're encoding the HTML tree into the type system by constructing View<> types that contain a tuple of it's descendants. You can see what that looks like in the below code, which does not crash.

#Cargo.toml
[package]
name = "linker_issue"
version = "0.1.0"
edition = "2021"

[dependencies]
leptos = { git = "https://github.com/leptos-rs/leptos", features = ["ssr"] }
// main.rs
use leptos::html::{div, p, HtmlElement};
use leptos::prelude::*;

fn main() {
    let view: HtmlElement<_, _, _, Dom> = div().child((
        div().child((div(),)),
        p().child((div(), div())),
        div().child((div(), div(), div())),
        p().child((div(), div(), div(), div())),
        div().child((div(), div(), div(), div())),
        p().child((div(), p(), div(), div())),
        div().child((div(), div(), p(), div())),
        p().child((div(), p(), div(), p())),
    ));
    println!("type: {:?}", std::any::type_name_of_val(&view));
    println!("\n\nsize: {:?}", std::mem::size_of_val(&view));
    let html = view.to_html();
    println!("\n\noutput: {html}");
}

So far we haven't nailed down a reasonable reproduction I can post, but I can reproduce the issue while trying to build one of our user's private projects. Running cargo leptos build, which runs cargo build --no-default-features --features=ssr, generates this error. I've put it in a gist because it is huge: https://gist.github.com/benwis/fe3a8010243c0f6b338f6aef0b0e7ad2

I'm not quite sure how to debug this, we'd love to use this system as it offers a number of benefits, but if it's going to break compilation at larger app sizes we'll have to defer. Any thoughts?

saethlin commented 2 days ago

This looks similar to https://github.com/rust-lang/rust/issues/83060. How big are the rlibs in question?

benwis commented 2 days ago

This looks similar to #83060. How big are the rlibs in question?

Funny to see you here The ones listed appear to be 311k, 97k and 1.8k respectively. There are much larger rlibs in here, the main app rlib is 4G

dmgolembiowski commented 2 days ago

I might be way off here (since I'm not currently compiling and checking anything) but I'd be curious if there are a bunch of monstrously large and monomorphized enums at play, @benwis

benwis commented 2 days ago

I might be way off here (since I'm not currently compiling and checking anything) but I'd be curious if there are a bunch of monstrously large and monomorphized enums at play, @benwis

Enums? I don't think so. But we do have those large types that contain a tuple that contains all the children

saethlin commented 2 days ago

The ones listed appear to be 311k, 97k and 1.8k respectively. There are much larger rlibs in here, the main app rlib is 4G

4 GB would do it. R_X86_64_32 means a 32-bit offset. I'm not sure the particular point at which the linker starts emitting errors is relevant, because that might just depend on the order of the functions in object files, which isn't stable.

You should be able to set RUSTFLAGS=-Zprint-type-sizes and see the sizes of all types in the computation, perhaps that will point you toward the problem.

workingjubilee commented 2 days ago

So far we haven't nailed down a reasonable reproduction I can post,

If this is about exceeding the 4GB limit, the most-minimal reproduction will be unreasonable. Nonetheless, we would like it.

benwis commented 1 day ago

So far we haven't nailed down a reasonable reproduction I can post,

If this is about exceeding the 4GB limit, the most-minimal reproduction will be unreasonable. Nonetheless, we would like it.

I'm working on getting y'all a repro. There are potentially multiple issues here and I need to look at it

gbj commented 1 day ago

monstrously large and monomorphized enums

This is entirely possible. Routing between different pages is handled by enums. A site that has, say, 16 pages, would have its view defined by something like EitherOf16<A, B, C, D, ...>, where each of those generics would themselves be a fairly large and complex view type like the above.

I've reopened a branch over in our repo that uses type erasure instead of that enum, and asked people to test against it. It had not solved some of our other issues with overall compile time on some of those larger repos, but it's possible it helps with this particular linker issue.

VincentBerthier commented 1 day ago

I’m also having linking issues on my side. I’m not sure it’s directly related to large types, but since it might… Here’s the error’s gist: https://gist.github.com/VincentBerthier/63d3e1187aeb291499fd755dcbbfba36 Note that I’ve no idea why, but cargo/the linker spends a suuuuuuuuuuper long time (as in, 20+ minutes…) on the last step…

benwis commented 1 day ago

@saethlin @workingjubilee I still don't have a postable reproduction, but I have more general questions and info I can provide. I ran -Zprint-type-sizes, and we do have sizes at 185KB and 85KB, which I imagine are pretty huge. I ran objdump on the largest .o file produced, and almost all of it is debug_str content. I'm thinking there might be two overall issues here, but I'm not an expert.

  1. A regression in nightly with large .rlib files as posted
  2. Too much debugging info in debug_str creating too large .o files. I don't know what goes into debug_str, but would large type names, large zero sized types, large type sizes and/or wide use of generics inflate these?

Some people report that a --release build works, which I suspect is 2. And some people report nothing works, which seems like 1.

Does that seem plausible?

saethlin commented 1 day ago

A regression in nightly with large .rlib files

If this is the case, it should be possible to bisect to it with cargo-bisect-rustc. You'll probably want the --script argument. Bisecting might take a while if the build is slow but it's very set-and-forget.

Too much debugging info in debug_str

If this is the case, a debug build with RUSTFLAGS=-Cdebuginfo=0 should link. You might also need -Cstrip? I'm not confident that's applied before linking though.

zakstucke commented 20 hours ago

My predominant linker error with leptos 0.7 happens on both stable/nightly + debug/release, so both of these theories seem a little off at least for my case:

  = note: ld: warning: ignoring duplicate libraries: '-liconv', '-lm'
          0  0x104fd6074  __assert_rtn + 72
          1  0x104f43994  ld::DynamicAtomFile::makeNamedAtom(std::__1::basic_string_view<char, std::__1::char_traits<char>>, ld::file_format::Scope, bool) + 488
          2  0x104f0c640  ld::InputFiles::ObjectFileParser::addAtomsForSection(mach_o::Image const&, ld::InputFiles::ObjectFileParser::SectionData&) + 5528
          3  0x104f0e058  ld::InputFiles::SliceParser::parseObjectFile(mach_o::Header const*) const + 2904
          4  0x104f1f830  ld::InputFiles::parseAllFiles(void (ld::AtomFile const*) block_pointer)::$_8::operator()(unsigned long, ld::FileInfo const&) const + 440
          5  0x1858e5428  _dispatch_client_callout2 + 20
          6  0x1858f9850  _dispatch_apply_invoke3 + 336
          7  0x1858e53e8  _dispatch_client_callout + 20
          8  0x1858e6c68  _dispatch_once_callout + 32
          9  0x1858f88a4  _dispatch_apply_invoke + 252
          10  0x1858e53e8  _dispatch_client_callout + 20
          11  0x1858f7080  _dispatch_root_queue_drain + 864
          12  0x1858f76b8  _dispatch_worker_thread2 + 156
          13  0x185a91fd0  _pthread_wqthread + 228
          ld: Assertion failed: (name.size() <= maxLength), function makeSymbolStringInPlace, file SymbolString.cpp, line 74.
          clang: error: linker command failed with exit code 1 (use -v to see invocation)