Open marcbowes opened 9 years ago
This is a Rust problem even more than a Cargo problem. You can't guarantee that a pre-built Rust library will work unless it's built with the exact same SHA of the compiler.
Yes unfortunately this would require changes to rustc
itself, so it's unable to be tackled at this time.
The specific restriction I'm referring to is that you're basically limited to only working with binaries generated by the exact revision of the compiler you're using, as well as the exact same set of dependencies.
Is there something I can read/follow that explains the issues relating to why this requirement is so strict? Is this expected to change over time?
Regardless, assume I can meet the requirement of providing a set of prebuilt libraries with the exact same SHA. Is this a reasonable feature? Even something like letting the build script emit --extern
as part of the whitelisted flags that cargo:rustc-flags
can configure would help me out (assuming that is easier to implement than another top-level dependency option).
/cc @aturon: I had a chat with Steve on IRC about this and he suggested getting your input.
One big reason I want to avoid building dependencies over and over is that our build system rebuilds consumers - in my example, a change to a
would trigger a rebuild of b
(including tests) and if b
failed, the new version of a
would not be released. The implication of this is that when building b
, a
would be rebuilt a second time. This becomes really wasteful.
I'm happy to contribute the change required to implement this if the team feels it is a worthwhile feature. I imagine there are plenty of companies out their with their own in-house build systems, so something like this could be an adoption blocker.
I actually meant @alexcrichton not @aturon :)
Is there something I can read/follow that explains the issues relating to why this requirement is so strict?
Unfortunately no :(. We don't have a ton of documentation in this area, just a bunch of cargo-culted knowledge. In general though this is largely because of two primary reasons (that I can think of):
Is this expected to change over time?
Certainly! We probably won't invest too much energy into it before 1.0, but I'd love to see progress in this area!
Regardless, assume I can meet the requirement of providing a set of prebuilt libraries with the exact same SHA. Is this a reasonable feature?
I suppose it depends on how much cargo integration you want. In your example you gave in the second comment, the manifest probably says that b
depends on a
, in which case cargo will already pass --extern
for a
when it compiles b
. Cargo would not only just have to forward your --extern
flags, but it would also have to know to turn off its own --extern
. Additionally it would then have to cut a
out entirely from the dependency graph.
In principle allowing --extern
from rustc-flags
would be possible, but it may have surprising results!
I imagine there are plenty of companies out their with their own in-house build systems, so something like this could be an adoption blocker.
I agree this would definitely be bad! I'd like to hone in on what's going on here first though.
My first question would be: Does Cargo suffice? If you're using cargo build
, then Cargo won't build a
if it hasn't changed and you've already built it, but it sounds like you're not using Cargo to build libraries?
I suppose my other questions would be based on that answer, so I'll hold off for that :)
Thanks for the detailed answer Alex!
My first question would be: Does Cargo suffice? If you're using cargo build, then Cargo won't build a if it hasn't changed and you've already built it, but it sounds like you're not using Cargo to build libraries?
Imagine a
is built by Travis. It outputs liba
, documentation and so forth - a collection of artifacts. People mostly discard these artifacts in practice, but you might imagine a system where those artifacts are retained. I'm sure this is not conceptually dissimilar to what your built bots do - you get some named and versioned output that you can later use either for development (ala rustup
) of yet more projects or for deployment purposes.
To integrate with this build system, one only needs to implement the simple contract: provide something that can be executed that will produce build artifacts. This is just a one-line shell script that turns around and calls cargo build
, and we're done.
Along comes project b
. It starts off the same as a
until we decide to use some of the functionality that a
provides. In cargo, you just add the dependency to the manifest - name and version. This build system works in the same way so we add it to it's manifest too. The build system uses this manifest to provide the build artifacts of a
for b
at compile time. Now the only thing left to do is adjust the path
attribute under [dependencies.a]
to point to the build artifacts ($A_ARTIFACTS/src
, if you will) and we're golden.
(We now have the dependency declared in two places. We can either live with the duplication, or adjust our build script to copy them from one into the other.)
However, we've just hit the first real problem: b
needs the source code of a
to compile but this doesn't really fit in with the concept of a build artifact. We can cheat by adjusting our shell script to also copy the src
of a
into it's build directory.
Hopefully, at this point, I've answered the latter question: the intention is to use cargo to build libraries such as a
or binaries such as b
. The reasons:
rustc
)Cargo.toml
is nicer than other interfaces like make
for customizationThe question then, is what happens when a
changes? At a high level, the system tracks dependencies, rebuilds them according to the graph and fails if something in the build breaks. This means that b
would be rebuilt against the new artifacts of a
. If we change our shell script to also execute cargo test
, this means that b
gets a chance to veto the build as a whole if the change breaks it in some way.
And this brings us to the second problem. a
is being built twice. If c
depends on b
, then the build will build a
three times and b
twice. This becomes incredibly wasteful pretty quickly. In the context of this specific system, it is also redundant because any changes to a
will trigger b
to be rebuilt; whereas in the "normal" world, b
will be rebuilt if a
changes but only when b
is explicitly built.
As I mentioned in the issue overview, I can work around this by using a cargo build script that adds a -L
option to rustc
, provided I just dump the libs output by the builds of all of it's (recursive) dependencies. This works and completely solves my problem. Incidentally, it removes the need for the other changes to the build script (don't need to copy source code in, don't need to declare dependencies in Cargo.toml
).
But then we hit the problem of ABI compatibility, for which it sounds like there is no solution yet. This means I'll need to find a way of (effectively) adding the rustc SHA to the tuple that identifies an artifact (similar to disambiguating 32/64 bit builds). Or just going with the aforementioned option of building all dependencies for each consumer.
A question you might also ask is: "would hosting a crates.io mirror help with this?". It doesn't. Not because it doesn't "work", but because it only meets some of the requirements (such as private code, not having direct dependencies on external sources for security reasons).
One huge benefit we get out of a single extensible build system is that adding a dependency on a Rust package is no different to adding a dependency on a C, Ruby, Java, Python or Haskell package - they're just named and versioned artifacts. A big use case for me is going to be enabled by that: for example, authoring Rubygems in Rust to speed up performance-critical code paths.
I hope this detail makes my initial question more clear: cargo does things I'd otherwise have to implement myself, but it also does things I'd like to skip. Specifically, I'd like to be able to use something like path
for exact control over where the dependency lives, but I don't want cargo to try build it.
FWIW, I'm probably going to go with:
Cargo.toml
and specify the path
Alex points out that http://doc.crates.io/build-script.html#overriding-build-scripts could be extended to support overriding rust crates. We could then generate .cargo/config
files on the fly.
Alright, after reading that over (thanks for taking the time to write it up!) it sounds like what we discussed on IRC is the best way to move forward with this. Specifically I'd be thinking of something like:
# .cargo/config
[target.$triple.rust.foo]
libs = ["path/to/libfoo.rlib", "path/to/libfoo.so"]
dep_dirs = [ ... ]
Note that the current overrides (target.$triple.$lib
) I think may want to be renamed to target.$triple.native.$lib
to give us some more leeway. When cargo detects this form of override, however, it will not build libfoo
but instead just pass --extern foo=...
to the paths listed and -L dependency=...
to all of the values in dep_dirs
.
One problem I can forsee, however, is that you mentioned about not wanting to share the source code between projects. Cargo would still need the source code, however, to read data such as the Cargo.toml
. Cargo doesn't actually need the entire source code base, but it'll need at least that much.
Does that sound like what would work for you?
bump?
Please support this, it's frustrating that it doesn't work yet.
I have to prebuild the ring
crate because on the server where I don't have root the GCC version is too old to build ring
, so I build it on a server where I'm root and copy it over.
Is there any way right now to use the prebuilt rlib instead of compiling it from crates.io? Maybe with a build.rs
script?
Nominated for discussion at the Cargo team meeting.
Was progress made on this at the meeting?
I don't know about "pre-built" dependencies, but it'd be nice to be able to build a variety of leaf crates without building the dependency crates more than once, if the leaf crates request the same features from the dependencies.
Bump?
Probably not: still nominated..
I'm still a bit in the dark about what happened here. Was anything discussed at the team meeting?
TL;DR Would just like to add another +1 for support for pre-built binaries. It would be great to get a follow up on what was discussed at the meeting.
Motivation Story
Last night we ran a workshop on the nannou creative coding framework. Seeing as nannou supports audio, graphics, lasers, etc along with quite a high-level API in a cross-platform manner, it has a lot of dependencies. It took between 5 minutes and 25 minutes (depending on the user's machine) for users just to build nannou and all of its dependencies for the first time in order for us to begin working through the examples together. Ideally in the future we would write a build script that attempted to first fetch pre-built dependencies before falling back to building from the src. It seems like the feature described within this issue would help to simplify this.
What about some global cache on disk for both the downloaded source code the built binaries, so that if the compiler and crate version match it can avoid a rebuild? Similar to yarn.
This is also what Nix wants to do. There won't be a compiler mismatch for packages built with Nix, because Nix will use the compiler as a build input to the package.
Bump. Does anybody know what's happening here?
I would be interested to know what's happening here. Our company has some products in Rust and we've been working on a new build system. If you use Docker to do your builds, it's actually a good way to get around this problem, because you can do a cargo build
during the container build process to cache the built dependencies. However, we also have to support building on Windows and macOS and you obviously can't get those in a container. This issue makes it difficult to have good build times in a build environment that makes use of on demand slaves.
@Twey I have a working solution for Nix, see https://github.com/rust-lang/cargo/pull/7079#issuecomment-508163585 for some context. Hopefully the patch in that PR can come in handy for other build systems.
Could this potentially be done by including the rustc version in the library format? I've created a pre-RFC about it.
How to fix ABIs across versions of rustc
... That would be this issue then:
https://github.com/rust-lang/rfcs/issues/600
Elsewhere I also saw some discussions about if / how to implement binary packages. Well for that aspect there is already an open standard:
https://theupdateframework.com/adoptions/
At the bottom of that ^^ page there is a link to an implementation in Rust
Which is still being developed. However it may something of interest to people here.
Hey, do you have any feedback on this?
I reckon pre-build binaries not only benefit long term builds but also short-term ones. In my case, our serverless application is getting bigger, therefore our build times are increasing linearly. Pre-build dependencies would definitely reduce this build time, even if occasionally we have to re-build these libraries from the code.
I understand the argument that, due to Rust's unstable ABI, the generated lib might not be compatible anymore. But we could work around that by pinning a Rust version in the CI/CD - just as others have mentioned. It should be possible for cargo, given this deterministic build scenario, to generate pre-built binary (rlibs or dyn-libs) from crates and use them as valid dependencies for our binaries.
FWIW, such a feature would be highly desirable for distributions based on functional package managers such as GNU Guix, where the whole dependency chain is controlled. In other words, the ABI compatibility problem of rustc is not really a problem for Guix.
Up, need support for this to speed up compilation specially for small computing devices
Be a good citizen in making it easy to integrate Rust code into bigger projects and implement this please.
How would this help with bigger projects?
In projects where you want to control all of the dependencies in a uniform detailed manner having a build system for some of them that demands to download packages itself is quite bothersome. Basically for all of the same reasons the Nix users above have given.
In that case you should probably not use cargo, but instead have whichever build system builds those dependencies for you, build all rust crates using rustc. Mixing two build systems at the same time for your dependencies is bound to give trouble. Even having two instances of cargo may result in some dependencies being built twice, which either had unintended effects or fails depending on if the two cargo instances used different -Cmetadata
values or not. For Nix there is already a program that converts cargo projects into native Nix build files directly invoking rustc. Similar things exist for other build systems like bazel. Is that not enough?
For Nix there is already a program that converts cargo projects into native Nix build files directly invoking rustc. Similar things exist for other build systems like bazel. Is that not enough?
These are all hacks on top of Cargo being both build system and package manager. I don't use Nix, I am trying to build packages with Spack which is similar. I don't see why I should have to write build scripts for other people's packages. At most some light patching or flags are needed to be changed to accomodate a meta-build system. Not rewrite the whole build script.
As a non-Rust developer just looking to use Rust packages I don't know how to recreate the actual build logic that Cargo performs. Is it as easy as just running rustc ...
?
I don't see why I should have to write build scripts for other people's packages.
You don't? For nix crate2nix takes a cargo project as input and produces a nix build script for you. You don't need to write a build script yourself. If anything using pre-built dependencies would require more effort on your end as you need to perform most of the actions of cargo to build your dependencies and then edit the Cargo.toml
to point cargo to them.
As a non-Rust developer just looking to use Rust packages I don't know how to recreate the actual build logic that Cargo performs. Is it as easy as just running rustc ...?
Not really. You can add -v
to a cargo invocation to show all rustc and build script invocations of cargo. You will see options ranging from specifying dependency locations, to setting the optimization level, determining the output location and -Cmetadata
with each invocation having a unique value to allow multiple versions to co-exist. Cargo also interprets the output of build script invocations to determine which extra arguments to pass to rustc. For example to link against a specific C library, or to set an env var pointing to generated source code.
For nix crate2nix takes a cargo project as input and produces a nix build script for you
A manual rewrite, crate2nix rewrites it what is the difference? The point being that Cargo doesn't provide the mechanism needed and so a translation layer is necessary. If for instance it operated (or had the option to operate) more like a traditional build system then no translation would be necessary at all. You would just flip a flag and Cargo would not fetch and build all the dependencies.
If anything using pre-built dependencies would require more effort on your end as you need to perform most of the actions of cargo to build your dependencies and then edit the Cargo.toml to point cargo to them.
I think you do see my point now though, I want to build everything myself, and Cargo to not do as much. Yes this means I have to repackage everything in my system, but the end goal is a single tree of dependencies across all language ecosystems without recompilations. Its a tradeoff of control vs convenience that currently Cargo doesn't support.
Cargo also interprets the output of build script invocations to determine which extra arguments to pass to rustc. For example to link against a specific C library, or to set an env var pointing to generated source code.
I did look through these recommendations and I run into this build script which seems like quite a tight coupling of rustc to Cargo. I will see if I can run the build scripts without Cargo.
My current idea though is to patch the Cargo.toml to use local dependencies as you mentioned above. I tried to look at the nix solutions but unfortunately its quite obtuse to someone that doesn't know that DSL and terminology.
How would this help with bigger projects?
There are couple of scenarios that I can think of:
In a large project it can be useful to divide the code into various layers for both architectural cleanliness, but also build efficiency. Frequently developers working in the middle or leaf layers will have a huge amount of code in the root layers that they will never touch - being able to use prebuilt libraries from CI instead of having to rebuild those layers is a huge productivity win. Even if you don't have the infrastructure to leverage prebuilt libraries from CI, being able to do a single full build and then manually rebuild only small portions of your code is helpful.
Having layered builds can also help CI by allowing the build to be distributed: once the root layers are built, those libraries can be distributed to multiple other build machines to build the wider graph of middle and leaf layers.
Note that, in this scenario, Cargo would not check if the rlibs were out-of-date (the original source files might not even be available), but rustc would be responsible for checking if the rlibs are compatible with the current build and fail otherwise.
One can think of this like the layering scenario above, but where the layers may cross organizations/projects/companies - namely that pre-built libraries are available (through a package manager, installed in a known directory, etc.) and that no source is available at all.
For example:
FWIW, such a feature would be highly desirable for distributions based on functional package managers such as GNU Guix, where the whole dependency chain is controlled. In other words, the ABI compatibility problem of rustc is not really a problem for Guix.
Note neither of the scenarios that I mentioned above require the use of an additional build system beyond Cargo: there may be some scripts required to grab the prebuilt binaries from somewhere, but one can also imaging just having multiple cargo.toml files in a repo that are unaware of each other, but that know that specific rlibs should be in a specific directory.
Even if another build system is involved, because Cargo handles things like build scripts, downloading/building dependencies, and setting up command line arguments to rustc, the idea of avoiding cargo and directly invoking rustc is a non-starter.
You would just flip a flag and Cargo would not fetch and build all the dependencies.
Then what would cargo do if not fetching and building? It already supports separating the fetch and build steps. cargo fetch
downloads dependency sources and cargo metadata
downloads them and then prints information you can use in a different build system to build your project. cargo metadata
is what crate2nix uses afaik. You can also use cargo vendor
to get the sources in a format that allows checking into source control or adding to a source bundle provided for downloads.
A manual rewrite, crate2nix rewrites it what is the difference?
A lot IMHO. A manual rewrite is a lot of work. crate2nix or equivalent can be integrated directly in your build system so that it is done transparently for you. One way or another you have to get a build script to be executed by your build system. The crate2nix approach means you don't have to manually write it, but can simply point the build system to a Cargo.toml
file making it just as easy as using cargo.
Having layered builds can also help CI by allowing the build to be distributed: once the root layers are built, those libraries can be distributed to multiple other build machines to build the wider graph of middle and leaf layers.
You don't need layers for that, right? You could consider every crate a single task to be scheduled across the build farm.
One can think of this like the layering scenario above, but where the layers may cross organizations/projects/companies - namely that pre-built libraries are available (through a package manager, installed in a known directory, etc.) and that no source is available at all.
If you are doing that with the intent to keep source private, just be aware that the crate metadata lists every private function and type and a whole lot of other information that should theoretically make it possible to reconstruct something that looks a lot like the original source code. (modulo regular comments. doc comments do end up in the crate metadata)
In a large project it can be useful to divide the code into various layers for both architectural cleanliness, but also build efficiency. Frequently developers working in the middle or leaf layers will have a huge amount of code in the root layers that they will never touch - being able to use prebuilt libraries from CI instead of having to rebuild those layers is a huge productivity win. Even if you don't have the infrastructure to leverage prebuilt libraries from CI, being able to do a single full build and then manually rebuild only small portions of your code is helpful.
Have you considered using sccache for that? It supports caching build artifacts in the cloud or on other machines. Using it is a matter of setting the RUSTC_WRAPPER
env var to the location of the sccache binary (or just RUSTC_WRAPPER=sccache
if it is in your PATH
). Note that sccache specifically requires you to trust everyone with access to the cache, as anyone could poison it with malicious artifacts. A similar approach should be possible to do in a more secure way though.
A thought occurred to me while thinking of a particular use-case for this that you'll run into difficulty with transitive dependencies that are shared between a pre-compiled rlib and the crate being built on top of it. Consider a crate x
that depends on y
and z
. y
also depends on z
, and has types from z
in its public API. Next, imagine that y
is pre-built, and so all we have is its rlib
. It was pre-built with z 1.0.0
. Now a user wants to build x
, but they don't have a lockfile or anything. They configure Cargo to use y.rlib
, and run cargo build
. Cargo chooses the most recent version of z
, which let's say is now z 1.1.1
. Now, we're suddenly trying to do a build that contains both z 1.0.0
and z 1.1.1
, which feels like it would lead to trouble, especially if the code in x
tries to pass a type from z 1.1.1
into y
'sAPIs which expect
z 1.0.0`.
Which is all to say: I think the lockfile needs to be embedded in/passed alongside an rlib
so that it can appropriately lock version resolution for builds that try to use that rlib
.
I am having a problem that may relate to this thread.
I am building my crate B (crateB/target/release/libB_plugin.so
) which depends on my crate A (libA.so
). My crate A also produces an executable (let's say ./A
) which links with crateA/target/release/libA.so
. At the runtime of A
, it dynamic loads libB_plugin.so
by dlopen
. However, this throws me a missing symbol issue, since libB_plugin.so
links with crateB/target/release/libA.so
(which is a fresh build of crate A). The dependent functions in libA.so
are actually the same function, but the mangled names have different hash suffixes.
Is there any way to make crateB only link with crateA/target/release/libA.so
?
I am kind of stuck on this problem and was hoping to get any suggestions. Thanks!
Update: My particular problem was tackled by
dlopen
for rust rlibs by hand (Similar to what Linux kernel module has done for '.ko' and what Glasgow Haskell Compiler has done for its objects)Relevance to this topic: A bigger software (hopefully implemented in Rust) might usually need be extensible or need that part of its code or behavior be dynamically changeable. This is usually done by a plugin system. What's good about Rust is if the plugin system is backed by WebAssembly (which there are many wasm runtime available), then the problem is solved. If the plugin has to be native code, such as dylibs, the current cargo's style of compiling everything from source from scratch, would be insufficient. Being able to build against pre-built dependencies at some point would be perfect.
Are there some simple cases that are easy to address? E.g. I'm installing cargo-zigbuild
, a single executable. What is the absolute minimum needed to make it work?
~/.cargo/bin/
?~/.cargo/.crates.toml
?In some ways, if adding a prebuilt extension doesn't work for some extension, it doesn't really have wider implications than "that extension is not currently supported". But tinkering around with the cargo dir manually could break stuff. So it would be nice to have a path that is known not to break other extensions. Even if it's just a short manual with a bunch of disclaimers.
When I get time I'll make a clean test environment with something like docker and see what works in this sample case.
I don't see how executables are relevant to this discussion. This issue is about pre-built libraries.
There is a Pre-RFC for a subset of this: https://internals.rust-lang.org/t/pre-rfc-sandboxed-deterministic-reproducible-efficient-wasm-compilation-of-proc-macros/19359. That is blocked on some work from #5720.
There is also #5931. That could be extended with a plugin system to allow something like sccache for distributed caching.
i'm looking into this considering the possibility of packaging rust crates in a gentoo overlay as precompiled rlibs. the strict compiler version is not that big of an issue on gentoo because subslotting on rustc can ensure that every lib gets recompiled when the system-provided rustc updates (even tho it's kind of a subpar solution, works)
for now i'll try installing rust crates system-wide in source format (as one would with python packages for example), but cargo supporting pre-built libs would reduce build times for rust packages, and is overall a better solution, imo.
i'm looking into this considering the possibility of packaging rust crates in a gentoo overlay as precompiled rlibs. the strict compiler version is not that big of an issue on gentoo because subslotting on rustc can ensure that every lib gets recompiled when the system-provided rustc updates (even tho it's kind of a subpar solution, works)
That has the issue that crates can have different cargo features enabled depending on who uses it. Each set of enabled cargo features results in a different rlib. Just enabling all cargo features is not really an option as it would likely pull in a lot of deps that are likely never used and some crates have mutually exclusive cargo features.
What you could try though is something like picking a couple of fixed sets of cargo features for each crate, building it with sccache as RUSTC_WRAPPER and then shipping the sccache cache entries and telling sccache to use those shipped cache entries. This way sccache will simply rebuild if the cargo features don't match and you don't need any cargo modifications.
That has the issue that crates can have different cargo features enabled depending on who uses it. Each set of enabled cargo features results in a different rlib. Just enabling all cargo features is not really an option as it would likely pull in a lot of deps that are likely never used and some crates have mutually exclusive cargo features.
gentoo has useflags as a core functionality, and useflags are basically features (build time options that the user can set and ebuilds can require). so i can represent features as useflags and an ebuild script can actually pick what should be present, by basically doing DEPEND="dev-rust/foo[nya]"
would say "i need the crate foo with the feature nya compiled in", then portage would recompile foo if it was previously compiled without the feature nya enabled.
How does that work with different programs that need mutually exclusive features of a crate? Or what about a program that needs a crate with the std
feature enabled and another one which doesn't want to use libstd? While cargo features are supposed to be additive, many crates don't follow this advice. In some case they even disable features using a cargo feature by following the rationale that it is a feature to allow compiling for more targets even though it is an negative feature as it causes compile errors for crates that need those things.
How does that work with different programs that need mutually exclusive features of a crate?
as far as i know, it conflicts, but
While cargo features are supposed to be additive, many crates don't follow this advice. if the common behaviour is to have features work in additive ways, the packager could make an exception for a specific binary that doesn't follow that. would be the simplest solution
if there's enough binary programs for this to be quite common, i'd have to poke and see. probably drop the idea of pre-compiling rlibs, sticking to installing only the source, since i can't see a better solution yet (still a improvement over the current way gentoo packages rust apps imo, but sad). precompiling libs for each package is the same as just installing the source, but with extra steps, so that's eh.
it's either get packages to share precompiled rlibs, or install the sources and let packages build them
it sounds analagous to re-inventing the wheel with respect to flatpaks, which are built ontop of versioned dependancies. such that multiple versions are installed, several duplicate clones of same libraries. and ostree to point to some git-esq btree structure or whatever it is.
i am not saying re-invent the wheel, but maybe there are already similar solutions out there that could be considered, evaluated, adapted to then fit the rust / cargo infrastructure?
Currently you can add dependencies using
path
orgit
. Cargo assumes this is a location to source code, which it will then proceed to build.My use-case stems from integrating Cargo into a private build and dependency management system. I need to be able to tell Cargo to only worry about building the current package. That is, I will tell it where the other already-built libraries are.
Consider two projects:
a
(lib) andb
(bin) such thatb
depends ona
:A clean build will output something like:
Importantly:
Would it make sense to expose a
extern
option (independencies.a
) for low level customization?This can be worked around by using a build script along the lines of:
But it is not ideal to have to do this with every project.