rust-osdev / cargo-xbuild

Automatically cross-compiles the sysroot crates core, compiler_builtins, and alloc.
Apache License 2.0
256 stars 25 forks source link

How to build *compiler_builtins* in optimized mode #77

Closed tjhu closed 4 years ago

tjhu commented 4 years ago

Hi,

When we run cargo xbuild --release ... --target x86_64-kernel.json, the memcpy being compiled is just a simple un-optimized for-loop. Looking at the source code, I think xargo builds sysroot crates in release mode by default.

I think there's something else in our settings that prevents xargo from building an optimized compiler_builtins but I am not sure what am I missing. We borrowed some of the setups, including the target.json, from Writing an OS in Rust.

josephlr commented 4 years ago

This isn't a bug in cargo-xbuild; just like xargo all of the sysroot crates (libcore, liballoc, compiler_bultins) are built with release by default. See: https://github.com/rust-osdev/cargo-xbuild/blob/df6db0706b061c474514365a62d80d5d6a1909ed/src/sysroot.rs#L120

This issue you're hitting is https://github.com/rust-lang/compiler-builtins/issues/339 which notes that the builtin memcpy implementation is just a simple un-optimized for-loop. If you want the no_std default implementation to be better, I would start there. Note that normally Rust just uses the memcpy defined by libc (which is very optimized, often written in arch-specific assembly); however, for no_std, this isn't really an option. This is esentially what GCC does as well.

The cargo-xbuild docs note that the memcpy metadata option can be used to enable/disable the default memcpy implementation, which should allow you to workaround this without changing compiler_builtins.

For example, if you enable package.metadata.cargo-xbuild.memcpy = false for your crate, you'll get a bunch of "undefined symbol" errors to memcpy/memcmp/memset. Then, you can have your crate (or an external crate) provide the appropriate definitions.

josephlr commented 4 years ago

Note that you can also link a custom memcpy implementation. For example, I got something working using musl via the following steps:

  1. Install musl, which (for my OS) installs a file /usr/lib/musl/lib/libc.a
  2. Add lines to your build.rs to tell Cargo where to find the library. In my example, this was:
    println!("cargo:rustc-link-search=native=/usr/lib/musl/lib");
    println!("cargo:rustc-link-lib=static=c");
  3. Now building w/ package.metadata.cargo-xbuild.memcpy = false works without linking errors.

Note that this approach is very application specific. Your libc.a must be compatible with your no_std target. My example only works because:

Finally, note that this complexity may not be worth it. Depending on your application, optimizing memcpy might have very little effect, as usually memory speed is the bottleneck for these sorts of operations.

tjhu commented 4 years ago

@josephlr Thank you very much! Your detailed guide helps us a lot!

We tried using Redox's implementation but found it not very fast and kinda buggy(there's an infinite recursion in memset). We thought that the compiler could be smart enough to optimize the un-optimized for-loop quite a bit, at least some loop-unrolling as we see in the compiler explore. We didn't know your solution exists and we were being lazy about writing and maintaining a fast memcpy by ourselves so we thought that there might be a way to ask the compiler to do more optimization for us.

josephlr commented 4 years ago

@tjhu https://github.com/rust-lang/compiler-builtins/pull/365 makes it so x86_64 targets will now build with a highly optimized memcpy and friends (using REP MOVSB). If that gets merged, then you should be able to use the very fast memcpy by default.

@phil-opp I think this can be closed.