Closed tjhu closed 4 years ago
This isn't a bug in cargo-xbuild
; just like xargo
all of the sysroot crates (libcore
, liballoc
, compiler_bultins
) are built with release
by default. See: https://github.com/rust-osdev/cargo-xbuild/blob/df6db0706b061c474514365a62d80d5d6a1909ed/src/sysroot.rs#L120
This issue you're hitting is https://github.com/rust-lang/compiler-builtins/issues/339 which notes that the builtin memcpy implementation is just a simple un-optimized for-loop. If you want the no_std
default implementation to be better, I would start there. Note that normally Rust just uses the memcpy
defined by libc
(which is very optimized, often written in arch-specific assembly); however, for no_std, this isn't really an option. This is esentially what GCC does as well.
The cargo-xbuild
docs note that the memcpy
metadata option can be used to enable/disable the default memcpy implementation, which should allow you to workaround this without changing compiler_builtins
.
For example, if you enable package.metadata.cargo-xbuild.memcpy = false
for your crate, you'll get a bunch of "undefined symbol" errors to memcpy
/memcmp
/memset
. Then, you can have your crate (or an external crate) provide the appropriate definitions.
Note that you can also link a custom memcpy implementation. For example, I got something working using musl via the following steps:
musl
, which (for my OS) installs a file /usr/lib/musl/lib/libc.a
build.rs
to tell Cargo where to find the library. In my example, this was:
println!("cargo:rustc-link-search=native=/usr/lib/musl/lib");
println!("cargo:rustc-link-lib=static=c");
package.metadata.cargo-xbuild.memcpy = false
works without linking errors.Note that this approach is very application specific. Your libc.a
must be compatible with your no_std
target. My example only works because:
x86_64-unknown-linux
memcpy
(and friends) doesn't use OS functionality (unlike glibc).no_std
target only needs to work on bare-metal x86_64
Finally, note that this complexity may not be worth it. Depending on your application, optimizing memcpy might have very little effect, as usually memory speed is the bottleneck for these sorts of operations.
@josephlr Thank you very much! Your detailed guide helps us a lot!
We tried using Redox's implementation but found it not very fast and kinda buggy(there's an infinite recursion in memset
). We thought that the compiler could be smart enough to optimize the un-optimized for-loop quite a bit, at least some loop-unrolling as we see in the compiler explore. We didn't know your solution exists and we were being lazy about writing and maintaining a fast memcpy by ourselves so we thought that there might be a way to ask the compiler to do more optimization for us.
@tjhu https://github.com/rust-lang/compiler-builtins/pull/365 makes it so x86_64 targets will now build with a highly optimized memcpy
and friends (using REP MOVSB). If that gets merged, then you should be able to use the very fast memcpy
by default.
@phil-opp I think this can be closed.
Hi,
When we run
cargo xbuild --release ... --target x86_64-kernel.json
, thememcpy
being compiled is just a simple un-optimized for-loop. Looking at the source code, I think xargo builds sysroot crates in release mode by default.I think there's something else in our settings that prevents xargo from building an optimized compiler_builtins but I am not sure what am I missing. We borrowed some of the setups, including the
target.json
, from Writing an OS in Rust.