Formal support for linking rlibs using a non-Rust linker

adetaylor commented 4 years ago

I'm working on a major existing C++ project which hopes to dip its toes into the Rusty waters. We:

Use a non-Cargo build system with static dependency rules (and 40000+ targets)
Sometimes build a single big binary; sometimes lots of shared objects, unit test executables, etc. - each containing various parts of our dependency tree.
Perform final linking using an existing C++ toolchain (based on LLVM 11 as it happens)
Want to have a few Rust components scattered throughout a very deep dependency tree, which may eventually roll up into one or multiple binaries

We can't:

Switch from our existing linker to rustc for final linking. C++ is the boss in our codebase; we're not ready to make the commitment to put Rust in charge of our final linking.
Create a Rust staticlib for each of our Rust components. This works if we're using Rust in only one place. For any binary containing several Rust components, there would be binary bloat and potentially violations of the one-definition-rule, by duplication of the Rust stdlib and any diamond dependencies.
Create a single Rust staticlib containing all our Rust components, then link that into every binary. That monster static library would depend on many C++ symbols, which wouldn't be present in some circumstances.

We can either:

Create a Rust staticlib for each of our output binaries, using rustc and an auto-generated .rs file containing lots of extern crate statements. Or,
Pass the rlib for each Rust component directly into the final C++ linking procedure.

The first approach is officially supported, but is hard because:

We need to create a Rust staticlib as part of our C++ tool invocations. This is awkward in our build system. Our C++ targets don't keep track of Rust compiler flags (--target, etc.) and in general it just feels weird to be doing Rust stuff in C++ targets.
Specifically, we need to invoke a Python wrapper script to consider invoking rustc to make a staticlib for every single one of our C++ link targets. For most of our targets (especially unit test targets) there will be no rlibs in their dependency tree, so it will be a no-op. But the presence of this wrapper script will make Rust adoption appear intrusive, and of course will have some small actual performance cost.
For those link targets which do include Rust code, we'll delay invocation of the main linker whilst we build a Rust static library.

The second approach is not officially supported. An rlib is an internal implementation format within Rust, and its only client is rustc. It is naughty to pass them directly into our own linker command line.

But it does, currently, work. It makes our build process much simpler and makes use of Rust less disruptive.

Because external toolchains are not expected to consume rlibs, some magic is required:

The final C++ linker needs to pull in all the Rust stdlib rlibs, which would be easy apart from the fact they contain the symbol metadata hash in their names.
We need to remap __rust_alloc to __rdl_alloc etc.

But obviously the bigger concern is that this is not a supported model, and Rust is free to break the rlib format at any moment.

Is there any appetite for making this a supported model for those with mixed C/C++/Rust codebases?

I'm assuming the answer may be 'no' because it would tie Rust's hands for future rlib format changes. But just in case: how's about the following steps?

The Linkage section of the Rust reference is enhanced to list the two current strategies for linking C++ and Rust. Either:
- Use rustc as the final linker; or
- Build a Rust staticlib or cdylib then pass that to your existing final linker (I think this would be worth explicitly explaining anyway, so unless anyone objects, I may raise a PR)
A new rustc --print stdrlibs (or similar) which will output the names of all the standard library rlibs (not just their directory, which is already possible with target-libdir)
Some kind of new rustc option which generates a rust-dynamic-symbols.o file (or similar) containing the codegen which is otherwise done by rustc at final link-time (e.g. symbols to call __rdl_alloc from __rust_alloc, etc.)
The Linkage section of the book is enhanced to list this as a third supported workflow. (You can use whatever linker you want, but make sure you link to rust-dynamic-symbols.o and everything output by rustc --print stdrlibs)
Somehow, we add some tests to ensure this workflow doesn't break.

A few related issues:

64191 wants to split the compile and link phases of rustc. This discussion has spawned from there.
@dtolnay's marvellous https://github.com/dtolnay/cxx is not quite as optimal as it could be, because users can't use -Wl,--start-group, -Wl,--end-group on the linker line. (Per https://github.com/rust-lang/rust/issues/64191#issuecomment-629418541)
the difficulties of using the staticlib-per-C++-target model happen to be magnified by #73047

@japaric @alexcrichton @retep998 @dtolnay I believe this may be the sort of thing you may wish to comment upon! I'm sure you'll come up with reasons why this is even harder than I already think. Thanks very much in advance.

nagisa commented 4 years ago

The second approach is not officially supported. An rlib is an internal implementation format within Rust, and its only client is rustc. It is naughty to pass them directly into our own linker command line.

But it does, currently, work. It makes our build process much simpler and makes use of Rust less disruptive.

rlib is not a complete package you may need to link. It could very well just be a section containing just rmeta (serialized "metadata" representing checked Rust code) from which rustc would generate code at the time it is "forced" by generating a binary output (link/staticlib/cdylib). The fact that rlib contains any machine code at all is just an optimisation.

What you really want is a staticlib-nobundle (not an actual thing) or something similar that would not bundle the the dependencies into produced .a.

petrochenkov commented 4 years ago

I'm still don't understand what are the requirements exactly, but if rlibs (as they exist now) work for you, but staticlibs don't, then perhaps you need some variation of staticlibs with different bundling behavior for dependencies, rather than rlibs.

adetaylor commented 4 years ago

The fact that rlib contains any machine code at all is just an optimisation.

Yep.

What you really want is a staticlib-nobundle (not an actual thing) or something similar that would not bundle the the dependencies into produced .a.

I agree - this is exactly what we need.

petrochenkov commented 4 years ago

@adetaylor Could you elaborate a bit. Suppose you have a Rust crate root with some tree of dependencies

  imm1 - dup
 /
root
 \ 
  imm2 - dup
   \
    indirect - native

where imm1, imm2 and dup are other rust crates and native is some native dependency like C static library.

What the expected result of compiling root is supposed to be?

adetaylor commented 4 years ago

Right now, it would be a staticlib and that's fine.

The problem comes in this scenario:

  imm1 - dup
 /
root
 \ 
  imm2 - dup

root2
 \ 
  imm2 - dup

c_plus_plus_binary_a
 \ 
  root2 - [...]

c_plus_plus_binary_b
 \ 
  root1 - [...]

  root1 - [...]
 /
c_plus_plus_binary_c
 \ 
  root2 - [...]

In this case, we can't build root and root2 as staticlibs, because they will conflict when we come to build c_plus_plus_binary_c.

petrochenkov commented 4 years ago

I don't mean right now, I mean what compilation of root and root2 should produce to fit into your use case.

libroot.a + libimm1.a + libimm2.a + libdup.a + libroot2.a + some file in json or other machine-readable format describing the dependency trees?

adetaylor commented 4 years ago

Exactly! Except that the JSON file probably isn't necessary. Our build system (gn + ninja) works extremely hard to establish static dependency relationships in advance, so that a rebuild builds the minimal required set. We would expect to use those relationships in the subsequent linker invocation unless there were some reason that we needed to query it dynamically.

alexcrichton commented 4 years ago

This seems like a reasonable feature to me to support. The goal of rlibs was to reserve space for rustc to "do its thing" so we intentionally stabilize some format rather than accidentally do so. I don't think there's a need for a "staticlib-nobundle" because that's basically what an rlib is. While @nagisa is right in that rlibs can have varying contents, I think that it's fine to say for a use case like this you'd pass some flag to rustc saying "make sure I can pass this to the linker". Rustc, after all, goes to great lengths to pass rlibs directly to the linker to make linking fast.

I think this could work well perhaps with -Z binary-dep-depinfo (I forget the exact flag name). That'd basically tell you "here's all the other rlibs you need to link for this target" which would locate the standard library for you and other deps (which you'd probably ignore).

The gotchas that I could think of are:

For allocators we'd probably just need to say "no custom allocator" or something similar. You'd probably run rustc with no arguments saying "generate me the allocator *.a file to pass to the linker", and that'd just run as part of your build system. That's also where we could shove anything else if it comes up in the future.
You'll be on your own for deduplicating Rust dependencies across dynamic libraries. In your example of c_plus_plus_binary_c it's generally asking for trouble if for root1 and root2 are dynamic libraries and try to share dependencies. The compiler has internal support for this but you'd just be on your own. (this is totally fine, just wanted to point out!)

Overall I think we could get away with this by carefully designing how a "stable" pass-things-to-the-linker process would work. We could make sure to solve the constraints here while also giving ourselves some leeway so rustc can still change things in the future if needed. It seems worthwhile to me to integrate into other build systems like this, and it's definitely a pain point with rlibs since inception.

nagisa commented 4 years ago

I think that it's fine to say for a use case like this you'd pass some flag to rustc saying "make sure I can pass this to the linker".

My suspicion is that the best and most holistic implementation of this kind of feature is represented by having a different crate-type alongside rlib. Implementing functionality that reliably satisfies the requirements outlined by the OP can require a fair number of adjustments to how monomorphizations are generated, where they end up and what their visibility ends to being. I think the answer to these and many other similar such questions are meaningfully different enough for rlib and the hypothetical libstatic-nobundle that the two shouldn't be conflated.

EDIT: Regardless of which way it goes, this feels to me like fairly major change in at least stance on what rlib is, would require extensive design work, a RFC and a dedicated implementer willing to implement said RFC.

adetaylor commented 4 years ago

Just to say: thanks all for the comments. I'm delighted that this doesn't seem like a completely idiotic request. From our (user's) point of view it is equally easy if this is a crate-type=static-nobundle as opposed to crate-type=rlib, either's fine.

We will bear in mind that this seems like a plausible way forward as we continue to think about our linking strategies. We'll go quiet for a bit, but it's possible that in a few months we will pop up here and say "hi, we'd like to implement this, here's an RFC...". Watch this space. Obviously if anyone else gets there first that's even better :)

bbjornse commented 3 years ago

@adetaylor may I ask: have you found a workable solution? Or did you have to abandon Rust because of this issue?

In my experimentation I have used --emit obj to produce an object file, and create an archive from this (and other C++ object files part of the same library) using ar. I then use the system linker to produce the final artifacts. Is this not a viable solution?

bjorn3 commented 3 years ago

In my experimentation I have used --emit obj to produce an object file, and create an archive from this (and other C++ object files part of the same library) using ar. I then use the system linker to produce the final artifacts. Is this not a viable solution?

That doesn't work when using liballoc as the allocator shim is only generated when compiling a crate that needs to be linked and even then doesn't get included in the --emit obj object file, but a separate object file that gets deleted after linking. It can't be included in the main object file as --crate-type lib --crate-type bin in which case the allocator shim must be included for the bin, but must not be included for the lib.

adetaylor commented 3 years ago

@bbjornse We haven't abandoned Rust, but we also haven't made any real progress in this direction.

I know several other teams at various organizations are using the approach of building rlibs then passing them all into the final C++ linker. This still works. It still requires nasty hacks as described in this issue description, and could break at any time.

bbjornse commented 3 years ago

That doesn't work when using liballoc as the allocator shim is only generated when compiling a crate that needs to be linked and even then doesn't get included in the --emit obj object file, but a separate object file that gets deleted after linking.

Interesting, thanks. Could you please explain: what exactly is the allocator shim, and why is it generated rather than part of a library? Does this relate to the __rust_alloc => __rdl_alloc remapping mentioned in the issue description?

Also, do you know if this is the only challenge with the object file approach? Are there other pieces which might be missing?

@bbjornse We haven't abandoned Rust, but we also haven't made any real progress in this direction.

I know several other teams at various organizations are using the approach of building rlibs then passing them all into the final C++ linker. This still works. It still requires nasty hacks as described in this issue description, and could break at any time

Thank you for your reply. This is somewhat disheartening. I'm still not abandoning hope for the --emit obj approach though: as far as I could tell the output was similar to that of the rlib, sans rmeta sections, and as you say using the rlibs directly can be made to work. Although passing rlibs directly to the native linker might never be officially supported, it sounds reasonable to me to expect that object files generated with --emit obj can be used in this way.

bjorn3 commented 3 years ago

Interesting, thanks. Could you please explain: what exactly is the allocator shim, and why is it generated rather than part of a library?

The allocator shim is what forward the allocator calls in liballoc to either a #[global_allocator] or the default allocator in libstd. Expanding it in place for #[global_allocator] won't work because if the crate with the #[global_allocator] and a crate linking to libstd get linked together the libstd default allocator wins as the dylib needs to have an allocator shim to link and the only allocator available when linking the dylib is libstd's default allocator.

Does this relate to the __rust_alloc => __rdl_alloc remapping mentioned in the issue description?

Correct

Also, do you know if this is the only challenge with the object file approach? Are there other pieces which might be missing?

Honestly I am not sure.

Although passing rlibs directly to the native linker might never be officially supported, it sounds reasonable to me to expect that object files generated with --emit obj can be used in this way.

Neither rlibs, nor --emit obj contain the allocator shim. In fact they can't. For rlibs this is the reason mentioned earlier. For --emit obj this is because for --crate-type lib --crate-type dylib --emit obj will generate a single object file, but the rlib must not have the allocator shim and the dylib must have it.

bjorn3 commented 3 years ago

I have been thinking about the rust compilation model lagely because of this and certain other deficiencies. No concrete actionable improvement yet though.

bbjornse commented 3 years ago

Thank you for your explanation. Have I understood correctly that

If any crate used as part of a program or dynamic library uses #[global_allocator], then that crate defines symbols __rg_alloc etc, and all references to __rust_alloc etc should be forwarded to __rg_* equivalents.
Otherwise, all references to __rust_alloc etc should be forwarded to __rdl_* equivalents.
No allocator shims are generated unless rustc is producing some kind of "final" artifact (--crate-type {bin,dylib,cdylib,staticlib} --emit link). In particular, they are not generated for --crate-type lib --emit rlib,obj.

I realize I don't understand why the rlib cannot include the shim. I see that it's only necessary when producing the final artifact, but I don't see how it would hurt to generate it immediately? (Unless I've misunderstood and #[global_allocator] is crate-local, of course.)

bbjornse commented 3 years ago

To be clear, the relation between my questions and this issue is whether --emit obj can do the trick instead of formal support for linking with rlibs or a hypothetical new static-nobundle crate type. Although --emit obj suffers from the same allocator shim problems and standard library dependency discovery problems as noted in the OP, it does not suffer from the problem that rlibs aren't intended to be passed to the system linker. Rather, the object file output strikes me as something intended to be passed to the system linker. And I'm still optimistic that the two other issues can be solved.

I'm not sure what the object file is guaranteed to include and not include, though. Crucially, is it e.g. guaranteed to not include dependencies, thus avoiding diamond dependency problems and being an improvement over the use of staticlib?

bjorn3 commented 3 years ago

I realize I don't understand why the rlib cannot include the shim. I see that it's only necessary when producing the final artifact, but I don't see how it would hurt to generate it immediately? (Unless I've misunderstood and #[global_allocator] is crate-local, of course.)

If you link to a dylib thaf already has an allocator shim, the rlib must not define an allocator shim, but the allocator shim of the dylib must be used. At the time of generating the rlib it is not yet known if it will be linked against a dylib. Only when compiling the final artifact.

bbjornse commented 3 years ago

I see, thanks. Is this only when the rlib shim would use the default allocator? Would it hurt to immediately generate the shim for crates which include a #[global_allocator]?

If crates require a shim at all, this means they use liballoc, right? And the prebuilt libstd already includes an allocator shim forwarding to the default allocator. How come this works, when another dylib might be providing a global allocator? Could a similar technique, linking with a "default allocator shim" dylib, work in a liballoc +no-std setting?

bjorn3 commented 3 years ago

When you dynamically link against libstd or any other dylib that uses liballoc, #[global_allocator] will be silently ignored. Instead the allocator shim embedder in said dylib will be used. This wouldn't be possible if rlibs that use #[global_allocator] immediately created the allocator shim.

anforowicz commented 3 years ago

RE: --emit=obj

It seems to me that the --crate-type switch controls the main output type of rustc (e.g. rlib, staticlib, bin, etc.) while the --emit switch can optionally ask rustc to retain additional, intermediate file types (e.g. obj, but also mir, llvm-bc, dep-info, etc.). It feels that the --emit switch can be useful for debugging what is happening during compilation, but should not be depended upon for end-to-end results, since the intermediate/internal files produced when building an rlib crate may change in the future. From this perspective, depending on the --emit=obj switch doesn't seem significantly different or less risky than depending on the details of what --crate-type=rlib produces.

RE: monomorphization concerns

I see that @nagisa has raised concerns that monomorphization within rlib might be fundamentally incompatible with the requirements here. FWIW, I’ve tried to explore this a bit and didn’t find anything broken so far - please shout if I missed anything.

In particular, I tried building the following things:

crate1: pub fn sub_in_crate1<T: Sub<Output = T>>(a: T, b: T) -> T { a - b }
crate2a: #[no_mangle] pub fn sub_from_7_in_crate2a(x: i32) -> i32 { crate1::sub_in_crate1(7, x) }
crate2b: #[no_mangle] pub fn sub_from_42_in_crate2b(x: i32) -> i32 { crate1::sub_in_crate1(42, x) }
C++ executable that depends on crate2a and crate2b

There would be indeed 2 copies of the monomorphized sub_in_crate1::<i32>:

$ nm out/rel/rust_odr_test | grep crate1
00000000001f23d7 T _ZN20rust_odr_test_crate113sub_in_crate117h56a8a4a969227185E
00000000001f2422 T _ZN20rust_odr_test_crate113sub_in_crate117hf5a81c2f183f301cE

But this seems fine:

The linker should hopefully be able to identify and eliminate the duplicated, identical binary code (and reduce binary bloat)
The behavior of the compiled code seems unaffected:
- I can’t see how having 2 copies might change the behavior of the compiled+linked code. (e.g. I don’t see a way to introduce a singleton that becomes bifurcated/duplicated because of the above)
- Rust seems okay with having duplicated monomorphisations in 2 rlibs when linking the dependent rlibs into the final executable. (Maybe there is a deduplication step somewhere, but a cursory look didn’t find anything so far.) And so it should be hopefully okay for C++ linker as well.

(More concrete and complete example is in a Chromium-specific code review tool: see here)

RE: RFC

I guess the next step would be introducing an RFC (as suggested above), but before doing that it is probably desirable to

Make sure there are no issues with the general approach
Try prototyping the changes (tweaking link_staticlib to skip adding each_linked_rlib in presence of a new, hypothetical-Z staticlib-with-no-transitive-deps switch)

WDYT?

jsgf commented 3 years ago

(--crate-type vs --emit)

One problem right now is that --emit is currently hashed into the crate hash, so, for example, --emit metadata and --emit link will generate different crate hashes for otherwise identical command lines. I'd really prefer this weren't the case, but I think this particular thing isn't relevant to this issue since by the time we get to linking crate hashes and rmeta should be irrelevant.

But aside from that, I'd agree that crate-type and emit have somewhat overlapping responsibilities (broadly, what kind of file(s) do I want?), but I'd characterize it as "crate-type sets the intrinsic properties of the output file" and "emit selects a number of auxillary outputs which can be cheaply generated along the way". But --emit obj and --crate-type staticlib are very close to each other really, ignoring the whole "and link in all the dependants" thing. (Does --emit obj also force codegen-units = 1, or is that just --emit asm?)

00000000001f23d7 T _ZN20rust_odr_test_crate113sub_in_crate117h56a8a4a969227185E 00000000001f2422 T _ZN20rust_odr_test_crate113sub_in_crate117hf5a81c2f183f301cE

I'm curious how this looks with -Zsymbol-mangling-version=v0, so that whatever's causing the difference isn't hidden in the hash.

I'm super interested in this effort. Currently Buck uses staticlib for C++ -> Rust dep edges, and I'd really prefer to have this be rlib/staticlib-with-no-transitive-deps and set up the dependencies to arrange all the other stuff appear on the final (C++) link line.

bbjornse commented 3 years ago

(--crate-type vs --emit)

I agree that depending on --emit obj, without formal or documented guarantees about what the output will contain, feels fragile (like depending on linking with rlib). Case in point, when experimenting with this I found that --emit obj works differently with --crate-type lib and with --crate-type staticlib.

I don't feel that --emit is just a debugging option, though; rather, I understand it as an option to make rustc a powerful stand-alone tool usable in a non-Cargo context.

Reasons I still think --emit obj is a reasonable alternative:

In contrast to rlib, it makes sense to me that the output from --emit obj is guaranteed to include object code. This is an improvement over rlib, where the presence of object code is an implementation detail (as noted above.)
As @jsgf notes, it's staticlib without "link in all the dependants" - which is (almost) exactly what we want (i.e. an improvement over --crate-type staticlib)
It makes it easy to create static libraries from a mix of C++ and Rust code. This is useful for bundling C++ bridges for a Rust library; bundling Rust bridges with a C++ library; or splitting a single library in Rust and C++ components.
The capability to generate an object file is natural to expect when coming from a C or C++ world, and so is a useful feature in itself. In fact, it already is a feature in itself; it's just unclear to me at least how the output depends on --crate-type, and in what contexts the output will be usable.

tmandry commented 3 years ago

Case in point, when experimenting with this I found that --emit obj works differently with --crate-type lib and with --crate-type staticlib.

Isn't this expected if staticlib links in all dependencies and lib doesn't? Or were there other differences?

bjorn3 commented 3 years ago

--emit obj doesn't link in dependencies as it doesn't link at all. It just emits an intermediate object file that would be given to the linker otherwise.

tmandry commented 3 years ago

Ah. I was imagining a relocatable object file, like what ld -r produces, for staticlib. Not sure what behavior is actually desirable in that case.

purpleposeidon commented 3 years ago

I found the format of --emit obj to change w/ --release.

bjorn3 commented 3 years ago

What do you mean with format? File format? If so from which file format did it change to which?

bjorn3 commented 3 years ago

Oh, and are you using cargo or rustc directly? If cargo, there is no stable way to locate all object file necessary for linking as dependencies are not stored in a stable location.

danakj commented 2 years ago

Working on https://bugs.chromium.org/p/chromium/issues/detail?id=1296284 I have a better idea I think of what is needed or not needed here.

Requirements:

Linking a bunch of C++ .o files and Rust .o files together effectively, using the C++ frontend to run the linker.
We should not link the Rust .o files together preemptively, they should be part of the final linking step as siblings to the C++ .o files. This avoids multiple copies of the rust stdlib and ensures linking decisions and optimization are happening in a single place. (This precludes --crate-type=staticlib.)
Crate compilation should behave the same as when producing an rlib - in particular it should not get slower or produce meaningfully different output.

Current solution: We use an .rlib file as a collection of .o files, and we must link it with -Wl,--whole-archive.

The proposal to use --emit=obj makes a lot of sense to me, it is a Rust .o file, which is what we need.

However, an .rlib is a collection of .o files because rustc splits the compilation up (via --codegen-units). So I am left with a question:

Does --emit-obj cause slower compilation by forcing --codegen-units=1? If so, that would contradict requirement #3.

Otherwise, --emit=obj producing different outputs is not necessarily a problem, they may not be meaningful. Perhaps they are artifacts of different codegen units for example. We'd need to understand what the differences are.

tmandry commented 2 years ago

We could easily make a mode that emits an archive of object files, I'm not sure what purpose --emit=obj has in not doing that. Maybe the option predates having multiple codegen units, and now we need to preserve it for backwards compatibility?

bbjornse commented 2 years ago

My current understanding of this is that the requirements can be split in three:

Officially supported ability to generate object code for a crate, without its dependencies, into a file which can be passed to the system linker.
Link in libraries provided by the Rust toolchain.
Generate any required shims.

Looking at how each of these requirements are addressed by different alternatives:

I suspect this is already satisfied with --emit obj, with the only missing piece being a documented guarantee.

But, the point about --codegen-units being forced to 1 is a valid concern. A quick search through rustc sources indicates this is true if we use --emit obj=<dstfile> - but not if we just use --emit obj (compiler/rustc_session/src/config.rs, function should_override_cgus_and_disable_thinlto()). Maybe someone knows if this is correct? EDIT: if this is true, I suspect users needing this functionality could use --emit obj --out-dir=<outdir> and ar rcs cratename.a <outdir>/*. Of course, --emit archive=<file> is preferable :slightly_smiling_face:
Support for linking rlibs is perhaps preferable as rlibs are actually distributed with the Rust toolchain. Alternatively, maybe library-archives could be added as a rustup component containing .a equivalents to the shipped .rlibs. A third option could be to add a new mlib type (multi-lib) or similar, working just like rlibs do today (i.e. both for linking and for rust compiltaion), and distribute those with the Rust toolchain.
Unaffected by the choice between -emit=obj, linking rlibs, or linking with the output of some new crate type staticlib-nobundle. The decision about what shim to use cannot be taken until link time, as only then do we know what shim should be used.

bjorn3 commented 2 years ago

quick search through rustc sources indicates this is true if we use --emit obj= - but not if we just use --emit obj (compiler/rustc_session/src/config.rs, function should_override_cgus_and_disable_thinlto()). Maybe someone knows if this is correct?

https://github.com/rust-lang/rust/blob/4c55c8362de32be1530b2441c3e3a51e73edeb21/compiler/rustc_session/src/config.rs#L288 forces a single cgu for any --emit obj I think.

I wonder if instead of allowing rlibs to be linked outside of rustc we could allow C object files to be linked with rustc, so rustc takes the place of the linker, handling any orchestration for rust libraries and then forwarding to the system linker or any other linker provided with -Clinker.

bbjornse commented 2 years ago

forces a single cgu for any --emit obj I think

I see this function where false is returned is named is_compatible_with_codegen_units_and_single_output_file() - and the code using it at line 1537 seems to only look at emit-flags with an explicit path:

    let incompatible: Vec<_> = output_types
        .0
        .iter()
        .map(|ot_path| ot_path.0) // This one - only proceed if we actually have an explicit path?
        .filter(|ot| !ot.is_compatible_with_codegen_units_and_single_output_file())
        .map(|ot| ot.shorthand())
        .collect();

Which is what makes me suspect that it only forces cgu=1 with an explicit path?

bbjornse commented 2 years ago

I wonder if instead of allowing rlibs to be linked outside of rustc we could allow C object files to be linked with rustc, so rustc takes the place of the linker, handling any orchestration for rust libraries and then forwarding to the system linker or any other linker provided with -Clinker.

I suspect this is a poorer fit for C++ codebases wanting to integrate Rust code into what they have. The link setup would then have to be recreated exactly as it were before, but indirectly through rustc. This is more complex both to read and write than just writing the desired linker invocation directly. It is also more fragile, as rustc must be trusted to produce the exact same linker invocation across rustc versions (sans differences in rust library names), and of course across runs.

bjorn3 commented 2 years ago

Which is what makes me suspect that it only forces cgu=1 with an explicit path?

ot_path here has the type (OutputType, Option<PathBuf>) I think. None here represents the case where no path is given. That line you mentioned ignores the Option<PathBuf> and only returns the OutputType.

jsgf commented 2 years ago

Semantically --emit obj seems fine to me, but the codegen units = 1 problem would be a real practical blocker. Something that emits a plain .a file would be a fine replacement and seems straightforwardly compatible with multiple cgus. In practice this could be a plain .rlib, or an rlib with an extra step of stripping out the rmeta files if that confuses the linker.

Ideally this would also support thin archives in environments which have such a concept, to avoid some needless file copies (which I think is an orthogonal potential improvement for rlibs).

@bjorn3:

I wonder if instead of allowing rlibs to be linked outside of rustc we could allow C object files to be linked with rustc, so rustc takes the place of the linker, handling any orchestration for rust libraries and then forwarding to the system linker or any other linker provided with -Clinker.

I don't think that would work in practice. For the cases I'm interested in, the linking step is immensely complex with lots of variants, the amount of code is enormous and mostly C++, with Rust comprising probably a fraction of a percent. It would be very hard to justify reengineering the build process to do linking via rustc. (The only way I can see that happening is if rustc had a wrapper to allow it to be a drop-in replacement for the existing linker. But if that were possible, I'm not sure what the benefit would be since it wouldn't be able to do any Rust-specific things.)

For the cases where the top-level binary is Rust, then we do use rustc as the linker, and inject all the non-Rust objects via -Clink-arg etc so they can be consumed by the underlying linker.

Somewhat unrelated aside: What's the distinction between --emit and --crate-type? For the most part I think the options make sense to me, but using --crate-type for cdylib/staticlib seems like an exception; they seem much closer match for --emit. But I don't have a crisp mental model for distinguishing when which option is the most appropriate. --crate-type for "binary" vs "library" makes sense, and --emit for "deps"/"asm" seem reasonable. But the middle ground of "rlib" vs "dylib" vs the native link variants seem like they could also be emit. And then there's the problem that --emit metadata's meaning changes depending on what other emit options you have.

bbjornse commented 2 years ago

That line you mentioned ignores the Option and only returns the OutputType.

Ah, you are right of course. Feeling stupid now :slightly_smiling_face:

Not giving up though, I'll suggest one more way in which the is_compatible_with_codegen_units_and_single_output_file() function can live up to its name. Looking at the code using the list of incompatible output types (shortened a bit):

    if !incompatible.is_empty() {
        match codegen_units {
            Some(n) if n > 1 => {
                if matches.opt_present("o") {
                    // <...>
                    codegen_units = Some(1);
                }  // ELSE don't touch codegen_units?
            }
            _ => {
                codegen_units = Some(1);
            }
    }

I.e. it appears cgu is set to 1 only if the -o option is used, or if it was not set. Testing this:

$ echo 'pub mod mod1 { pub fn f() {} } pub mod mod2 { pub fn f() {} }' | rustc --crate-type lib --emit obj -C codegen-units=2 --out-dir=/tmp/example -
warning: ignoring emit path because multiple .o files were produced

warning: 1 warning emitted                                                                                                                                                                                                
$ ls /tmp/example
rust_out.rust_out.b25ea2e0-cgu.0.rcgu.o  rust_out.rust_out.b25ea2e0-cgu.1.rcgu.o                                                                                                                                                                                                    
$

So it would appear we can get codegen-units > 1 with --emit obj, albeit with a warning?

Of course, a mode for emitting archives as @tmandry suggested would in any case be preferable :slightly_smiling_face:

jsgf commented 2 years ago

So it would appear we can get codegen-units > 1 with --emit obj, albeit with a warning?

It's a bit awkward if we don't know what their filenames are though. Ideally it's something we could know in advance (eg by specifying the, or they're computable in a straightforward way). I guess we could use artifact notifications or just inspect what was left in a previously empty directory after the fact.

hlopko commented 2 years ago

Another vote for an archive output, both .rlib and .a are fine for us. FTR In our setup lld links .rlib even with metadata just fine. I can see how having a --emit=archive=<file> would be nice to use with pipelining (where metadata and rlib are already split into separate outputs).

Also +1 to @jsgf, in Bazel it's preferable to specify output filenames explicitly as well.

It seems the discussion is converging on the question of the output file. Do folks have opinions on generating allocator shims? @bbjornse wrote:

Unaffected by the choice between -emit=obj, linking rlibs, or linking with the output of some new crate type staticlib-nobundle. The decision about what shim to use cannot be taken until link time, as only then do we know what shim should be used.

In our setup we actually know what allocator to use before link time (we do not support specifying the allocator in the source code), so having a mode where we can ask rustc to emit specific shims would be super useful for us. Something like --emit=system_allocator_shims. Are there other shims besides the allocator?

danakj commented 2 years ago

I came across an issue with using an archive output. The linker drops object files from an archive when there is no unresolved symbol that the object file satisfies. In particular, this means that in a test binary, a file of unit tests which are registered with static initializers produces object files that get dropped if linked through an archive. This happens because the object files have no external calls into them, so have no unresolved symbols. This gets especially interesting in Rust where a .rs file can be split into many object files, so even hacks to call some method in each .rs file do not resolve it.

We are working around this while we're linking .rlib files by passing --whole-archive to the linker for the rlibs. However not all platforms support --whole-archive. The AIX linker for example does not: https://chromium-review.googlesource.com/c/chromium/src/+/3470980/

This means we need to either:

Unarchive the object files from rustc's emitted archive.
Have rustc generate object files directly, and emit a list of them (to stdout?)

But I've also noticed that Files cannot be extracted from a thin archive, which was proposed earlier https://github.com/rust-lang/rust/issues/73632#issuecomment-1038604003.

bjorn3 commented 2 years ago

Static initializers don't work reliably even when using rustc as linker either. I think you probably shouldn't use static initializers with rust for the time being until it is fixed.

hlopko commented 2 years ago

Good points, thanks @danakj! I apologize for long reply, I had a late evening coffee and now I'm out of control :)

I'll try to reply to your points in the order you wrote them.

whole-archive

Static initializers are problematic also in a purely C++ world, and using --whole-archive is in many situations the right solution. I understand that in your scenario you have to use whole archive. Would you agree that this is not the most common case though? For example, in our codebase I see an order of magnitude fewer 'whole-archived' libraries (alwayslink in Bazel lingo). Also, we don't want to default to 'whole-archive' in the whole build, because that would have code-size consequences.

Your scenario is maybe related to https://github.com/rust-lang/rust/issues/64191. Assuming we can split compilation and linking, and the result of compilation is an archive, it seems we want to link the archive for "crate-type=bin" as 'whole-archive' (I actually don't know, I'm just guessing). Of course, if the result of compilation is a directory with objects, instead of 'whole-archiving' we could just them on the linker command line directly, to achieve the same effect.

AIX

Are you sure AIX linker is not doing an equivalent of "whole-archive" by default? From https://www.ibm.com/docs/en/aix/7.1?topic=l-ld-command I read:

The ld command processes all input files in the same manner, whether they are archives or not. It includes the symbol tables of all objects, discarding only symbol definitions that duplicate existing symbols. Unlike some other versions of the ld command, you do not need to order archive files so references precede definitions. Furthermore, you do not need to list an archive file more than once on the command line

That gives me the impression that AIX doesn't do the link-time dead code elimination (or it does but uses a fixpoint iteration to resolve symbols first)? Later in doc for cdtors:

The linker gathers information about C++ static constructor or destructor functions and saves this information in the output file. The incl suboption tells the linker which archive members to search when creating the saved information. The possible values are: all Searches all members of all archives for constructor or destructor functions. This is the default. ...

And that is maybe saying that stuff is 'whole-archived' by default. Honestly I don't really understand what that page says, but if you have access to AIX machine, maybe it's worth trying?

Objects vs archive

I'll ignore the AIX issue for now

Of course both these outputs are equivalent, you can always create an archive from objects, and unarchive an archive to get objects. Maybe the question is what is more convenient more often. I come from the Bazel perspective, and that colors my preference:

Single output is better than multiple declared outputs, and that is better than multiple outputs that are only discovered during the compilation
- single output means we only upload one file to the remote execution farm, fewer roundtrips
- multiple declared outputs is good in the C++ form (one .o per one .cc file), because it allows us to only recompile .cc files that were affected by the change (for example user edited a file). In Rust form we have to recompile whole crate no matter which or how many files changed, so having multiple declared outputs would not yield any caching advantage.
- if outputs are discovered during compilation Bazel cannot easily work with that (the build graph is fully constructed before the first compilation happens). There are ways to overcome that design principle, but to get good peak build performance we would have to call rustc with a wrapper tool that would create the archive (very likely also in the case of multiple declared outputs)

thin-archives

@jsgf to make sure I'm not completely wrong. I understand thin-archives as an disk-space-saving mechanism for C/C++ builds, where each .o file is created by a separate compiler invocation. The compiler typically writes the .o file to disk. Then a different tool is used to create an archive, which normally just copies contents of .o files into a new .a file. So at the end each .o file consumes twice the disk space it could. Thin archives allow us to reference the original .o file instead of copying the content. In our workloads this optimization is superseded by --start-lib/--end-lib support in gold or lld (there you don't even need to create the thin archive).

For rustc, it seems that it should keep the object file content in memory and only write the archive to disk. So there is nothing to reference from the thin archive. Unless, and here probably comes @jsgf to teach me a lesson :) we use incremental compilation and rustc is able to only update specific object files, so it would be beneficial to write object files separately.

@bjorn3 I believe that with the recent push for stabilizing native linking modifiers (https://github.com/rust-lang/rust/pull/93901) we may be getting there slowly but steadily :)

bjorn3 commented 2 years ago

Native linking modifiers fixes it for C static libraries. They don't apply to regular rust dependencies.

hlopko commented 2 years ago

Yup, what I meant was that with that there is a way to make C/C++ static initializers work correctly when using rustc as the linker driver.

danakj commented 2 years ago

Thanks for the detailed reply @hlopko, what you said makes a lot of sense.

I didn't think that AIX might always link in everything - that's very good news. I agree with your reading of those quotes. Second, thanks for going over thin archives for me, that makes a lot of sense and I am not concerned then, as rustc would not be producing such.

Regarding single-file vs multiple-files, GN also constructs a build graph up front, like Bazel. My understanding of the tool is that it groups by file types however. So a linker line looks something like:

        rspfile_content = "{{inputs}} {{solibs}} {{libs}} {{rlibs}}"

Where the variables here come from GN itself, not from the user-written build scripts or templates. The {{inputs}} here are .o files, and come from C++ files in the current library. If Rust were to produce a library file, it gets grouped with all the other libraries, making it difficult to --whole-archive just the single library that is the peer of the C++ .o files for this mixed-language library. So I've been thinking of how to make Rust closer to C++ such that the .o files are peers, and one way would be to have the build work with .o files. However like you say, a dynamic set of files poses challenges. In the end it's just software and we can make anything work with enough effort, so having good properties that lead to efficiencies is what matters. Toward that I can agree that a deterministic (single) output is good.

Last, I don't want to hijack this bug with problems of static initializers themselves - I know there's problems being dealt with elsewhere. The point was just to explain the problems we're having with linking rustc output as a library with C++ output, that would not be present if we could just work with object files across the board. I should note we're not using rustc to make the binary, as it's the C++ linker doing that work, so the --crate-type is never executable for rustc, and we wouldn't benefit from https://github.com/rust-lang/rust/issues/64191 as a result. In Bazel systems my understanding is this problem doesn't occur in C++ because tests are linked into a dynamic library, thus all objects are kept. If a static library is used for the tests, then --whole-archive is needed in the same way.

So tl;dr I see your points and I certainly hope it would reasonable to teach GN to handle treating a "peer" library, for a mixed-language-inter-dependent target, different than the others. But if it failed, I'm also encouraged to know we could extract the objects.

jsgf commented 2 years ago

@hlopko

[thin archives]

Yeah, I think you're right. From my perspective, thin archives are mostly a disk IO/copy bandwidth saving rather than specifically for disk space saving, though that's a nice side benefit. If the flow is cc -> .o -> ar -> .a, then it can get expensive if you have, for example, lots of debug info enabled (typically 10x the .text size). But if you're right that llvm directly emits .a without an intermediate .o (and I think you are), then that's moot.

--start-lib/--end-lib does give similar benefits to thin archives, but unfortunately they're incompatible with link groups (--start-group/--end-group). Though that's only relevant for gold since lld effectively acts as if the entire link line were in a link group.

And for the record, my primary interest is building with Buck which shares many characteristics with Bazel.

An additional constraint with all this is that we need to have explicit dependencies on all the sysroot crates, including all their dependencies, expressed as prebuild_rust_library rules (for now, though we're also looking into building libstd from source with Buck rules). And in addition, prevent rustc from implicitly using sysroot. (This isn't strictly necessary for pure rust builds, but it is to get the right sysroot libraries to a final C++ linker.)

This turns out to be surprisingly hard - not the dependencies themselves, but working out when a crate has a dependency on a sysroot crate. (I have a little python script which can generate sysroot deps from the crate metadata itself.)

We can generally assume everything depends on std (or core for #![no_std] crates). This is reasonably simple (but awkward because you can't identify no_std crates from the metadata alone). But it seems you can also define a dependency on rustc-std-workspace-(std|core|alloc) or compiler_builtins in a Cargo.toml, which cargo recognizes and basically suppresses on the assumption that rustc will just pluck that dependency from its sysroot. We must suppress sysroot access, and replace it with an explicit dependency on the appropriate sysroot rule targets.

This seems reasonably easy to do in an ad-hoc way for small examples, but I've yet to make it work for a full-scale build using a non-trivial number of third-party crates.io deps.

bjorn3 commented 2 years ago

But it seems you can also define a dependency on rustc-std-workspace-(std|core|alloc) or compiler_builtins in a Cargo.toml, which cargo recognizes and basically suppresses on the assumption that rustc will just pluck that dependency from its sysroot.

No, the rustc-std-workspace-* crates are in crates.io with an empty content. In the rust build system there is a [patch.crates-io] section replacing them with alternative versions which re-export the respective standard library crate. The compiler-builtins version on crates.io is the exact same used by the standard library. (in fact the standard library gets it from crates.io)

jsgf commented 2 years ago

No, the rustc-std-workspace-* crates are in crates.io with an empty content. In the rust build system there is a [patch.crates-io] section replacing them with alternative versions which re-export the respective standard library crate. The compiler-builtins version on crates.io is the exact same used by the standard library. (in fact the standard library gets it from crates.io)

I think I see - I'll need to dig into it some more. But I did notice that Cargo has specific knowledge of the rustc-std-workspace* names baked into it, and they don't appear as normal resolved dependencies in the cargo metadata output. This led me to believe that they're intended to be resolved against whatever sysroot your rustc is using (since quite a few crates.io packages depend on them, I guess as a result of being used within sysroot).

rust-lang / rust