rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.56k stars 12.74k forks source link

Move compiler-rt build into a crate dependency of libcore #34400

Closed alexcrichton closed 8 years ago

alexcrichton commented 8 years ago

One of the major blockers of our dream to "lazily compile std" is to ensure that we have the ability to compile compiler-rt on-demand. This is a repository maintained by LLVM which contains a large set of intrinsics which LLVM lowers function calls down to on some platforms.

Unfortunately the build system of compiler-rt is a bit of a nightmare. We, at the time of the writing, have a large pile of hacks on its makefile-based build system to get things working, and it appears that LLVM has deprecated this build system anyway. We're trying to move to cmake but it's still unfortunately a nightmare compiling compiler-rt.

To solve both these problems in one fell swoop, @brson and I were chatting this morning and had the idea of moving the build entirely to a build script of libcore, and basically just using gcc-rs to compile compiler-rt instead of using compiler-rt's build system. This means we don't have to have LLVM installed (why does compiler-rt need llvm-config?) and cross-compiling should be much more robust/easy as we're driving the compiles, not working around an opaque build system.

To make matters worse in compiler-rt as well it contains code for a massive number of intrinsics we'll probably never use. And even worse these bits and pieces of code often cause compile failures which don't end up mattering in the end. To solve this problem we should just whitelist a set of intrinsics to build and ignore all others. This may be a bit of a rocky road as we discover some we should have compiled but forgot, but in theory we should be able to select a subset to compile and be done with it.

This may make updating compiler-rt difficult, but we've already only done it once in like the past year or two years, so we don't seem to need to do this too urgently. This is a worry to keep in mind, however.

Basically here's what I think we should do:

Staging this is still a bit up in the air, but I'm curious what others think about this as well.

cc @rust-lang/tools cc @brson cc @japaric

brson commented 8 years ago

This can probably be done one platform at a time. Get it working on x86, then add a flag to the target spec that compiler rt isn't required.

While I want to do this, one thing that bugs me about it is that it ruins the 'purity' of core. Right now core is a pure-Rust library, really easy to build, although you can't do anything with it in practice without compiler-rt. Maybe there's nothing to do about it and I just need to accept that Rust isn't completely pure. For now, at least. Once we've got the build system transitioned and isolated just the parts of compiler-rt that core needs, we can start thinking about rewriting compiler-rt in Rust.

In light of those considerations, I think I'd prefer it if compiler-rt was its own crate that core depends on. Just seems like a cleaner division of responsibility.

japaric commented 8 years ago

I was going to open a similar issue because I have already started implementing this idea here (but compiler-rt in its own crate rather than under core) :smile:. Which is I have been using to cross compile compiler-rt for ARM Cortex-M processors.

:+1: from me. I pretty much agree with @alexcrichton assessment:

This means we don't have to have LLVM installed (why does compiler-rt need llvm-config?)

nods the user won't need to have cmake or make installed either.

This may make updating compiler-rt difficult, but we've already only done it once in like the past year or two years, so we don't seem to need to do this too urgently.

Agree. I might add that updating our hacks on top of their build system across compiler-rt updates is going to be equally difficult anyway.

re: steps

Compile select portions of compiler-rt as part of this build script, using gcc-rs

I think we can start with all the "generic" C routines. IME, those (cross) compile fine to all the architectures I have tried (x86, arm, mips). The arch-specific assembly (*.s) routines are more troublesome, for example cross compiling most of the arm/*.s routines to Cortex-M fail because they have been writing with the Cortex-A target in mind (the routines use ARM mode, which is not available in Cortex-M processors; the routines use Thumb instructions that are not available on ARMv6 Cortex-M processors; etc.). Those can be added in add as needed/tested.

Disable injection of compiler-rt in the compiler

I'm not quite sure what this means. Surely you don' mind stop passing the -lcompile-rt flag to the linker. But, rather remove the compiler-rt compilation step from bootstrap?

@brson

IMO core doesn't need to depend on building compiler-rt. no_std programs don't always depend on compiler-rt. In fact, I think that people doing x86 kernel dev don't bother compiling compiler-rt because they don't need the intrinsics. Also, you can get quite far building programs for embedded Cortex-M microcontrollers without using compiler-rt. For example, zinc doesn't use compiler-rt.

OTOH, compiler-rt is mandatory when you build programs that depend on std because the compiler passes -lcompiler-rt to the linker. Perhaps we could make compiler-rt an optional dependency of core?

steveklabnik commented 8 years ago

In fact, I think that people doing x86 kernel dev don't bother compiling compiler-rt because they don't need the intrinsics.

Can confirm; I only use libcore, not compiler-rt.

alexcrichton commented 8 years ago

@japaric

Those can be added in add as needed/tested

True yeah, I kinda would prefer to build nothing and then only add in intrinsics as we actually run into them. This may not be a great test, however, as we may accidentally pick up routines from libgcc all the time instead.

I'm not quite sure what this means

Oh yeah I just mean the compiler injecting -lcompiler-rt as it'd either be bundled in libcore or part of another library so this would just be redundant. This is just a minor technical detail, nothing too much to worry about.

In fact, I think that people doing x86 kernel dev don't bother compiling compiler-rt because they don't need the intrinsics.

Yeah it's true that most programs don't, but for us to be robust we actually need to deal with this. Functions like 64-bit multiplication with overflow on 32-bit end up lowering down to these intrinsics I believe, and a few others here and there.

Basically I don't think we can get by with not shipping a compiler-rt, so for tier 1 platforms we need to at least have some form of it somewhere. This should in theory then extend very nicely to embedded platforms as they should all be able to compile C anyway, so we just need to be sure the compiles actually succeed.

japaric commented 8 years ago

Also thinking far off in the future if we go down this road:

alexcrichton commented 8 years ago

I would personally expect sanitizers to be present in separate crates, not dependencies of libcore (or part of libcore). I'd like to support them eventually but I'd also like to build them separately.

Also yeah I think that the fallback to cc is basically wrong in almost all situations so that should just be an error outright no matter what (with messages about env vars and such)

brson commented 8 years ago

I'm not quite sure what this means. Surely you don' mind stop passing the -lcompile-rt flag to the linker. But, rather remove the compiler-rt compilation step from bootstrap?

I would expect this. Let's package it up the same as any other crate, so the compiler doesn't specifically know about it.

IMO core doesn't need to depend on building compiler-rt. no_std programs don't always depend on compiler-rt. In fact, I think that people doing x86 kernel dev don't bother compiling compiler-rt because they don't need the intrinsics. Also, you can get quite far building programs for embedded Cortex-M microcontrollers without using compiler-rt. For example, zinc doesn't use compiler-rt.

OTOH, compiler-rt is mandatory when you build programs that depend on std because the compiler passes -lcompiler-rt to the linker. Perhaps we could make compiler-rt an optional dependency of core?

I think @alexcrichton's idea was that no consumer of Rust should ever have to think about compiler-rt. It just comes with core as appropriate for the target, and everybody already depends on core. If a particular kernel project doesn't actually depend on the symbols defined in compiler-rt then that's fine - the linker will just throw them away, just like every other unused part of core. If a particular platform doesn't need compiler-rt at all, then core just won't link to it.

japaric commented 8 years ago

@brson I see what @alexcrichton meant now. Thanks for elaborating!

@alexcrichton

True yeah, I kinda would prefer to build nothing and then only add in intrinsics as we actually run into them.

So, start with nothing, run the full Rust test suite and crater, add missing symbols and repeat.

This may not be a great test, however, as we may accidentally pick up routines from libgcc all the time instead.

You mean that libgcc also provides intrinsics with the same name? And those get picked up instead of compiler-rt's?

Oh yeah I just mean the compiler injecting -lcompiler-rt as it'd either be bundled in libcore or part of another library so this would just be redundant. This is just a minor technical detail, nothing too much to worry about.

I see what you mean now. Yeah that would be better. It would let us drop (as in "stop caring about") the no-compiler-rt field. I think I tried something like this before but the linker dropped the libcompiler-rt.rlib symbols when linking a static no_std binary; it could have just been a linker argument ordering issue though.

I would personally expect sanitizers to be present in separate crates, not dependencies of libcore (or part of libcore). I'd like to support them eventually but I'd also like to build them separately.

I meant building the sanitizers with gcc-rs instead of with llvm's cmake build system. I agree they should not be part of libcore. They would probably end up in their own crate below the std crate.

alexcrichton commented 8 years ago

So, start with nothing, run the full Rust test suite and crater, add missing symbols and repeat.

Crater may not be too much help here but if we can get past the auto bots and compile all necessary symbols for all platforms I suspect that'd be all we need. We tend to be pretty good about coverage of the standard library in general, although not bullet proof of course.

You mean that libgcc also provides intrinsics with the same name? And those get picked up instead of compiler-rt's?

Yeah I believe that at least some intrinsic names overlap, although perhaps not all of them. Most of the time you end up linking to libgcc in one way or another if you start including some C code, so we may require some intrinsic names that end up getting pulled in from libgcc instead of from compiler-rt as before.

In general though this is just a vague worry. If things compile and link just fine then it doesn't really matter too much where it comes from, presumably we'll solve it at some point.

I meant building the sanitizers with gcc-rs instead of with llvm's cmake build system. I agree they should not be part of libcore. They would probably end up in their own crate below the std crate.

Ah for these we'd probably want to use cmake. I know far less about the sanitizers and they seem like they would be more complicated. That and the sanitizers don't need to work on every platform rustc supports so we may not need to cut corners.

vadimcn commented 8 years ago

A crazy idea: could we translate all the builtins that we need into Rust? The .c code in compiler-rt doesn't look too complicated. Then we'd depend only on assembler (and we could even bundle llvm-as with Rust).

Of course, this will increase the cost of updating, but as @brson said, the builtins subset of compiler-rt seems to be pretty stable.

Edit: nevermind, brson has already proposed that. Serves me right for reading on a phone!

alexcrichton commented 8 years ago

Yeah at this point I'm about ready to just throw in the towel on using vanilla upstream compiler-rt, it's too much of a pain so far and doesn't seem to have enough gain. If we can figure out what LLVM actually needs (in terms of intrinsics), I'd be totally ok adding those to libcore (or some other lib) and just going from there.

japaric commented 8 years ago

If we can figure out what LLVM actually needs (in terms of intrinsics)

I think this command yields most of (if not all) the intrinsics LLVM uses:

$ grep 'RTLIB::.*"' src/llvm/lib/Target/*/*Lowering.cpp

The output looks like this (Spoilers there are lots of them).

vadimcn commented 8 years ago

I think these are just platform-specific overrides. The full list is here.

japaric commented 8 years ago

The full list is here.

Nice find! I think not all of those intrinsics are used though. For example, __ashlhi3 only appears once in the (LLVM) code base and there's no implementation of it in the compiler-rt repo.

vadimcn commented 8 years ago

Could we simply pre-build all possible versions of compiler-rt and distribute it as a binary? I think the set of (CPU targets * calling conventions) is fixed by LLVM implementation, so it should be feasible if we used clang?

alexcrichton commented 8 years ago

We tend to always run into trouble with distributions of binaries, and with things like target-specific flags and such I'd suspect that we can't really ship an end-all-be-all build of compiler-rt. I'd personally prefer to get it working building from source so we can just build everything from source eventually.

brson commented 8 years ago

A crazy idea: could we translate all the builtins that we need into Rust?

This should definitely be part of the long term plan.

Amanieu commented 8 years ago

The 128-bit integer RFC requires rewriting some of the compiler-rt builtins in Rust.

japaric commented 8 years ago

@vadimcn

The full list is here.

Actually, that list doesn't contain the __aeabi_* intrinsics that LLVM emits when working with FPUless 32-bit ARM processors like the Cortex-M. I think that it doesn't include other platform specific intrinsics either but I'm not familiar with those.

@alexcrichton

Yeah I believe that at least some intrinsic names overlap ... so we may require some intrinsic names that end up getting pulled in from libgcc instead of from compiler-rt as before

I wrote a script to get a list of the intrinsics that overlap; the result is in this gist. Note that the LLVM intrinsics listed there are a (probably overestimated) guess; we haven't found a definite list of intrinsics that LLVM may emit on all platforms.

vadimcn commented 8 years ago

Actually, that list doesn't contain the _aeabi* intrinsics that LLVM emits when working with FPUless 32-bit ARM processors like the Cortex-M. I think that it doesn't include other platform specific intrinsics either but I'm not familiar with those.

The list I pointed to is a default initialization. The target backends then override some of the entries with platform-specific names, so all _aeabi stuff is in ARMISelLowering.cpp.

brson commented 8 years ago

I tagged this with help wanted assuming no one is otherwise going to jump on this. Someone a bit familiar with how rustc and std interpenetrate could get started creating a compiler-rt crate and linking it in somewhere.

cuviper commented 8 years ago

It may be a tangent, but would it be possible for this to use a system compiler-rt? Fedora has a compiler-rt package, and I verified its libclang_rt.builtins-x86_64.a has the same symbols.

alexcrichton commented 8 years ago

@cuviper perhaps yeah, if we follow the route in this issue it may be relatively trivial with Cargo's library override support depending on how we do it.

alexcrichton commented 8 years ago

We're upgrading compiler-rt in https://github.com/rust-lang/rust/pull/34743 as part of an LLVM upgrade, and it's again unfortunately quite an ordeal due to the sheer number of patches we need to apply (many of which don't apply cleanly any more).

cuviper commented 8 years ago

@alexcrichton Can you give a brief overview of what sort of compiler-rt patches there are? That will help me decide the feasibility of using the distro package. (FWIW, I already have makefile patches semi-working to allow external compiler-rt, and hope to send a PR soon.)

shepmaster commented 8 years ago

I believe this will be quite useful when it comes time for the AVR port to use exotic tools like the formatting infrastructure, so I'm very excited for this!

alexcrichton commented 8 years ago

@cuviper 100% of our patches are build system nonsense and compiling on not-x86 platforms, so if distros can provide a pre-built compiler-rt that should work just fine for our use cases!

For posterity though it probably can't hurt to describe the patches. This information is based on the rust-2016-06-16 branch on github which we're currently using:

The theme for almost all these patches is that compiler-rt tries to figure out what compilers to use and flags to pass, and we already know all that information and we're just trying to communicate it to compiler-rt.

alexcrichton commented 8 years ago

Also now reading over that list, it's not 100% related to build system stuff, there's a few minor things for the ARM personalities and whatnot.

cuviper commented 8 years ago

Awesome, thanks! It does sound we dodge a lot of bullets by using the distro build.

Regarding the ARM stuff, I noticed that both the makefiles and rustbuild only map to builtins-armhf if it's not armv7. Do you know what that's about? At least in Fedora's armv7hl build it does call it builtins-armhf.

(Sorry if this is too tangential. I can file a new issue if this needs much more discussion.)

cuviper commented 8 years ago

Oh, I guess that's a consequence of the "3 logical ARM builds" patch. Hmm...

alexcrichton commented 8 years ago

Ah rustbuild just copied the makefiles and the makefiles are just literally the first thing that worked, so I'd believe that they may be wrong!

cuviper commented 8 years ago

I think it is right in the context of the "3 logical ARM builds" patch, but the distro doesn't have that.

I was having my change for external compiler-rt specify a directory, so rust could pick out whatever filename it wants from there. Maybe I should just give the full path to the actual library file that the distro has. That won't help cross-compilation, but Fedora only compiles on native hosts anyway.

Anyway, now that I understand where it's coming from, I can tinker with it.

alexcrichton commented 8 years ago

I've taken an initial stab at this in https://github.com/rust-lang/rust/pull/34873, although not moving to a build script just yet, only redoing the build system.

brson commented 8 years ago

Current status: we're building compiler-rt builtins ourselves using rules defined in mk/rt.mk and src/bootstrap/native.rs (if i recall correctly).

Potential next steps:

cuviper commented 8 years ago

Include all the builtin source in rustc-builtins. I'm not sure the best way to go about this in a way that's maintainable, but I might suggest simply copying the source instead of trying to manage it as a submodule.

Why? Are you intending to fork that code, or just vendor it?

With my eye towards wanting to use an external build of these builtins (from the distro), a submodule makes it a lot clearer what's being vendored.

alexcrichton commented 8 years ago

Yeah eventually we intend to rewrite all this in Rust.

cuviper commented 8 years ago

OK, sounds good, thanks.

japaric commented 8 years ago

Potential next steps:

I'm working on this. I think I got everything but the Makefiles done. I'm smoke testing right now by cross compiling std to arm-unknown-linux-gnueabi. I want to cross compile a binary that actually uses compiler-rt intrinsics.

bstrie commented 8 years ago

One of the major blockers of our dream to "lazily compile std"

@alexcrichton Is there an open issue for this somewhere? I'd like to know what this means in more detail.

alexcrichton commented 8 years ago

@bstrie it's this issue! We just need to be sure to lazily compile compiler-rt as part of cargo build of the standard library, but that's what this issue is about.

briansmith commented 8 years ago

If compiler-rt were rewritten in Rust, then would non-Rust code be able to use the Rust-coded version? For example, let's say I've linked a program using 4 LLVM-based programming languages together. Would I have multiple copies of the compiler-rt logic linked into my program? That would suck. A much simpler solution to rewriting compiler-rt would be to just include llvm-as and clang in the Rust distribution, so that we can cross-compile the asm and C code when (re)building std. Besides eliminating the need to rewrite compiler-rt, this would benefit lots of users of Rust as they could then rely on llvm-as and clang being available.

japaric commented 8 years ago

If compiler-rt were rewritten in Rust, then would non-Rust code be able to use the Rust-coded version?

Presumably yes, the intrinsics in the Rust version will be exposed in the same way as the intrinsics in libcompiler-rt.a. (Side note: On failure, the compiler-rt intrinsics call an "abort" function defined in compiler-rt; I'm not sure if that should become panic on the Rust version or if we should port the abort function to Rust -- this has not been discussed yet but the former path would hinder this use case you mention)

For example, let's say I've linked a program using 4 LLVM-based programming languages together. Would I have multiple copies of the compiler-rt logic linked into my program?

No, the linker will use the version that "encounters first" and ignore the rest. Here first means: from all the objects passed to the linker as arguments, the symbol found in the first object (argument) that provides it. Note that this happens today for all std programs (at least on Linux): Rust programs are linked against gcc_s and against compiler-rt and both provide "compiler-rt" intrinsics like __floattidf (see nm -D libgcc_s.so | grep __ and nm libcompiler-rt.a | grep __)