rust-lang / rust-roadmap-2017

Tracking Rust's roadmap
215 stars 12 forks source link

Rust should integrate easily into large build systems #12

Open aturon opened 7 years ago

aturon commented 7 years ago

Overview

When working with larger organizations interested in using Rust, one of the first hurdles we tend to run into is fitting into an existing build system. We've been exploring a number of different approaches, each of which ends up using Cargo (and sometimes rustc) in different ways, with different stories about how to incorporate crates from the broader crates.io ecosystem. Part of the issue seems to be a perceived overlap between functionality in Cargo (and its notion of compilation unit) and in ambient build systems, but we have yet to truly get to the bottom of the issues—and it may be that the problem is one of communication, rather than of some technical gap.

By the end of 2017, this kind of integration should be easy: as a community, we should have a strong understanding of best practices, and potentially build tooling in support of those practices. And of course, we want to approach this goal with Rust's values in mind, ensuring that first-class access to the crates.io ecosystem is a cornerstone of our eventual story.

Projects

At this point, we are still trying to assess the problems people face in this area. If you have experience here, please leave a comment with your thoughts!

luser commented 7 years ago

I started a repo not long ago to collect anecdotes about people integrating Rust into existing projects. I only have a few examples in there (and I keep seeing more crop up all the time). It'd be great to at least do a survey of all the examples we can find of people attacking this problem to see what the major issues were.

From the Firefox perspective, we have a pretty custom build frontend that generates Makefiles in the backend, so we would have had to write the custom integration bits regardless. We did find that life got a lot better once we started invoking cargo instead of rustc directly. I'm sure there are projects that would get value out of having individual Rust source files in their codebase, but it feels like a lot of the value in the Rust ecosystem comes from using cargo and leveraging crates.io (I doubt this is contentious).

goertzenator commented 7 years ago

I've worked on build systems to compile Rust components for Erlang applications, so I'm in a good position talk about a few issues. An overview of the building that takes place can be found in this README. Erlang can make use of bins, dylibs, and cdylibs.

  1. dylibs on OSX have special link requirements ("-- --codegen link-args='-flat_namespace -undefined suppress'") which creates a cascade of fussy work. Firstly, to provide that flag I need to "cargo rustc" instead of "cargo build", and to do that I need to detect all the binary/lib targets and build them one at a time. I really wish I could just "cargo build" and have cargo sort out the details for me. Maybe a "--extension-lib" flag for "cargo build" to apply this behavior? I understand this linking scenario is not unique to Erlang.

  2. Discovering and locating the output files is tricky. I have to "cargo read-manifest" to find the targets, form a platform-specific name from those results, then parse the flags for any "--target" flags to form a path for these files. I would love a flag of the form "--print-artifacts=[bin|lib|dylib|cdylib]" for "cargo build" to print the full output path and name to stdout.

bobsomers commented 7 years ago

This is great. Integrating Rust/Cargo with Bazel would be the first major hurdle to us using more Rust in our codebase at work, which is a pretty large mass of C++ code with some Python sprinkled throughout.

Since both Bazel and Cargo are fairly opinionated about how builds and package management should work, and since I am an expert in neither, it's not immediately clear to me which build system should be doing what or if we should just try to integrate rustc into Bazel without Cargo at all. Using strictly Cargo is (unfortunately) probably out of the question since most of the C++ and Python packages are dependency tracked with Bazel BUILD files, and any serious integration with the C++ code would require our Rust libraries and binaries to be dependency tracked by Bazel as well.

steveklabnik commented 7 years ago

I thought Bazel already had basic support for rustc? Maybe my info is outdated though.

On Mon, Feb 6, 2017 at 5:51 PM, Bob Somers notifications@github.com wrote:

This is great. Integrating Rust/Cargo with Bazel https://bazel.build/ would be the first major hurdle to us using more Rust in our codebase at work, which is a pretty large mass of C++ code with some Python sprinkled throughout.

Since both Bazel and Cargo are fairly opinionated about how builds and package management should work, and since I am an expert in neither, it's not immediately clear to me which build system should be doing what or if we should just try to integrate rustc into Bazel without Cargo at all. Using strictly Cargo is (unfortunately) probably out of the question since most of the C++ and Python packages are dependency tracked with Bazel BUILD files, and any serious integration with the C++ code would require our Rust libraries and binaries to be dependency tracked by Bazel as well.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust-roadmap/issues/12#issuecomment-277840577, or mute the thread https://github.com/notifications/unsubscribe-auth/AABsipZo1JTHfFy5idL4sB97BOsZPUT1ks5rZ6P3gaJpZM4Ly9Nx .

mfarrugi commented 7 years ago

Bazel has support for local sources, which is to say it does not support crates.io. There's an issue open for it, but the project is relatively inactive https://github.com/bazelbuild/rules_rust/issues/2. One would need to keep a local repository clone of a crate and all its dependencies to make bazel happy without modifications.

Bazel's rust targets also can't be depended on by c/c++ at the moment.

The kythe project has wrappers for bazel to call cargo, but I it's not the most robust approach to integration. For reference, kythe:tools/build_rules/rust.

disclaimer: I don't know too much about bazel, happened to learn the above looking into it last week.

withoutboats commented 7 years ago

Bazel's been mentioned several times this has come up. I believe Dropbox needs to integrate its Rust into a Bazel build process, and I think I might have been told Facebook uses it as well (but it might also have been that Facebook has an internal tool is similar to Bazel). It seems like a promising tool to look into for this issue.

davidzchen commented 7 years ago

Bazel Rust Rules author here. Apologies for the inactivity on the rules_rust project; I have been busy with other projects on my plate. I am planning to implement the workspace rules for pulling crates from Cargo (bazelbuild/rules_rust#2) by the end of Q1 with a stretch goal of also implementing tooling for automatically generating Bazel BUILD files from Cargo.toml (bazelbuild/rules_rust#3), though the latter will likely extend into Q2.

The Kythe project has rules that shell out to Cargo directly as quick stop-gap measure, but those rules are meant to be used internally in Kythe for now (since they are not hermetic for instance) and the plan is to replace them with the rules in rules_rust once features such as pulling from Cargo are supported.

Of course, if anyone is interested in helping with improving the Bazel rules for Rust, contributions are certainly welcome. :)

+cc @damienmg

shahms commented 7 years ago

Kythe contributor here. We'd definitely like to see better integration with Cargo from the upstream Bazel Rust rules. Our extant integration was very much a hack to allow our intern to make progress on the Rust indexer itself, rather than getting bogged down with Bazel integration.

LegNeato commented 7 years ago

Facebook uses Buck, and there is some early Rust support:

https://buckbuild.com/rule/rust_binary.html

Facebook vendors their dependencies in-tree.

jsgf commented 7 years ago

Facebook uses Buck, and there is some early rust support:

I've spent quite a bit of time on that over the last few months, and it's getting pretty solid now. It's well integrated with the overall build/test system and (most recently) can also interop with cxx rules.

jpakkane commented 7 years ago

I'm the main author of Meson build system that is being used by GStreamer and a bunch of other projects. We also have Rust support, which is a bit rudimentary but can be used to build stuff like a Python extension module that uses C, C++, Rust and Fortran in a single target. We aim to improve Rust support. This is especially important for mixed-language projects, since Cargo is nice for plain Rust projects but I'm fairly certain that Cargo developers do not want to add first class multiplatform C/C++ build support to Cargo.

larsbergstrom commented 7 years ago

A key part of doing this well will involve handling updates to the dependency graph in a deliberate and piecemeal way, particularly in scenarios where upstream "master" has moved to a new version (e.g., as in https://github.com/rust-lang/cargo/issues/2649).

We (@nox, @simonsapin, and I) have had many conversations with @alexcrichton and @wycats on this front, and I believe the leading contender from their point of view is for some extensions to fix up paths and avoid the abuse of [replace], as it runs immediately into version-related roadblocks for any non-trivial project.

moretea commented 7 years ago

Nix (which has a similar pure view of build systems as Bazel) uses a trick that involves cloning a well known version of the crates index git repository, see https://github.com/NixOS/nixpkgs/tree/master/pkgs/build-support/rust

sjmackenzie commented 7 years ago

Sadly the nixpkgs crates index is not hermetically sealed, it gave us many problems. So we implemented a crate index nixifier, which reads crates.io-index and spits out a nixfied crates index. This allows one to use nix to completely manage transitive crate dependencies of a project without needing cargo. The repo: https://github.com/fractalide/nixcrates

acmcarther commented 7 years ago

I've taken a look at teaching bazel how to digest Cargo's toml files to pull down third party crates.io dependencies automatically, and there don't seem to be very many sticking points.

A couple of rough patches I've seen though:

tupshin commented 7 years ago

Just commenting here because I believe there is a lot of opportunity to appeal to the JVM eco-system, and incrementally replacing java/scala/etc code with rust if we are able to make it trivial to incorporate rust into ant/maven/etc builds, as well as go the other way and be able to add a jar to a Cargo.toml or build.rs, and have automatic rust bindings generated, maybe using a combination of https://github.com/kud1ing/rucaja/ and https://github.com/kenpratt/jvm-assembler as starting points. This could all happen outside core, obviously, but rust adoption by the very large, and as-of-yet untapped-by-rust, ecosystem of jvm code and developers would be quite beneficial.

Ericson2314 commented 7 years ago

I've proposed https://github.com/haskell/cabal/issues/3882 for Cabal. The same thing can work for Cargo, and would solve this problem for everyone.

luser commented 7 years ago

@raphlinus and I were discussing some related issues on IRC not long ago. One of the ideas that he floated was a way to make cargo simply output the commands that it would run to do the build, so that we could leave parsing Cargo.toml to cargo, but allow other build systems like bazel to run the build like they expect.

alexcrichton commented 7 years ago

Thanks for the comments everyone! I and a few others have thought a lot about this in the past as well, and I wanted to jot down some notes and conclusions that we've reached historically.

First and foremost we've historically concluded that build system integration is not finished until you've got access to crates.io crates. The standard library is purposefully conservative and small in size with the explicit intent of having rich functionality in the ecosystem on crates.io. If an integration doesn't allow easy access to crates.io, then there's more work yet to be done!

Today, of course, Cargo is the primary gateway into the crates.io ecosystem. Cargo is also the primary build tool for Rust, but there's typically perennial questions about how to integrate Cargo into existing build tools. Many issues have been solved over time in this vein, such as vendoring dependencies, workspaces, etc. Cargo also has the benefit of being friendly and familiar to existing Rust programmers with a shared workflow across the ecosystem.

Something else that we've concluded, however, is that preserving Cargo workflows should not necessarily be a hard constraint for build system integration. Existing projects already have a workflow associated with them, and Rust code should integrate as it is fit instead of imposing restrictions on how it works. Of course though preserving a Cargo-based workflow for the Rust-specific portions is a nice-to-have!

And finally one last thing we've talked about is compilation units. For example C/C++ have files as compilation units and that's typically what build systems for C/C++ are normally architected around. Rust, however, doesn't have this granularity of compilation unit. Fundamentally the compiler supports crates as the compilation unit. Moving up the stack to Cargo it ends up generally being the case that the entire crate graph is Cargo's compilation unit (one command outputs the entire crate graph). The question is then how does this integrate into an existing build system? Is the crate graph sufficient? Does the granularity need to be finer, such as crates? (I'm not sure!)

One part to consider about compilation units is that they typically heavily affect caching in build systems. For example distributed caches may cache entire compilation units but nothing more granular. This means that a DAG-as-a-unit would be probably too coarse for caching. On the other hand, though, crate-as-a-unit is not clear how to integrate between build systems and Cargo today.

So with all that I think we're faced with a few problems that may be thorny to solve:

Unfortunately I don't have a whole lot of solutions just yet, I'm personally still at least trying to grapple with the problem space. @luser does the above sound accurate though for Gecko's Rust integrate at least on a high level? @jsgf could you detail some of the work you've done at a high level for Buck for Rust support?

I've found that each build system tends to have its own unique set of constraints for integration, but the more we know the easier we can accommodate everyone!

alexcrichton commented 7 years ago

Oh one point I should also mention is that I personally think that it's at least relatively important to try to lean on Cargo as much as possible with build system integration. Cargo is the bread and butter of building Rust, and avoiding Cargo leaves build systems as massive number of features to reimplement. I'd much rather pursue avenues to add features and/or make Cargo more flexible to interoperate with existing build systems. For example I could imagine Cargo generating build files or working in a much more granular fashion assuming another process manages inputs/outputs.

jsgf commented 7 years ago

The things that make Cargo awkward to use in our environment are:

What I think you're saying is that going directly to rustc is too low-level for your taste; you'd prefer to have a higher-level tool that's actually coordinating builds. But on the other hand, cargo is too high-level for our purposes. As a standalone tool I think its excellent, but it tries to impose too many opinions to interact well with other build environments.

Perhaps there's some scope for a mid-level tool that provides cargo's mechanisms, but not the UI, and a higher-level tool that presents a nicer UI/user experience when its used as the primary build mechanism. Or perhaps rustc itself is that interface, and it just needs to be designed accordingly?

Right now I'm handling all this by using cargo to download crates.io crates and manage all their dependencies, then prebuilding them and keeping all the build artifacts. All our internal builds are built with buck using its dependency management, and ultimately linked with the prebuilt crates.io code. That way cargo is a one-time operation rather than something that's involved with every build, while still taking advantage of it to manage all the code that's intended to be built with it.

Ericson2314 commented 7 years ago

@alexcrichton DAG-as-a-unit will always be too course grained. That is basically what we have now, and it is not good enough. For crate-as-a-unit though, I'd think build.rs would not be a problem because any dynamism from build.rs only affects the current crate, right? The dependency graph with crate-granularity is still static.

So yeah, what needs to be done at a minimum is making two new mode of operation for Cargo:

  1. Make a complete plan (DAG with crate nodes); this is way way more than a lockfile. Do impure things here like download crates.io indices and crates.io pkg sources here too.

  2. Build one crate/node from the pre-made plan. Assume every node gets its own $out directory, and paths to all (transitive) dependencies $out directories will by passed to Cargo in this mode.


A cool follow-up to this would be writing a dependency-management library that both allows serializing plans such that external tools can drive the build, or executing the plan in the current process. This would avoid any code duplication in Cargo and, I'd guess, be useful for rustbuild, too.

See https://github.com/haskell/cabal/issues/4174 for analogous effort with Cabal and Shake (though, unfortunately, Shake does not allow exporting static dependency graphs like this).

davidzchen commented 7 years ago

I also echo @jsgf's point about build.rs not being ideal. build.rs is used by Rust projects to compile code in other languages, such as C++ libraries using gcc-rs, but this should really be handled by the build system itself.

For example, the Bazel rust rules supports interop with C/C++ code, meaning that you can have the following:

cc_library(
    name = "foo",
    srcs = ["foo.cc"],
    hdrs = ["foo.h"],
)

rust_library(
    name = "bar",
    srcs = ["src/lib.rs"],
    deps = [":foo"],
)

Also, of note, while one mode of using Bazel is to vendor all dependencies (which is the practice followed by Google internally), Bazel also supports fetching external dependencies (which are all done prior to build time) and provides an simple API for writing repository rules. As mentioned above, a rule that fetches crates from Crates.io is in the pipeline for users who do not prefer to vendor all dependencies.

alexcrichton commented 7 years ago

Thanks for typing that up @jsgf!

It downloads things at build time

To clarify, I'm under the impression that this is a solve problem today. With multiple vendoring options available that was at least the intention! Did you find, though, that the vendoring support wasn't suitable for Buck's use case?

It's awkward to have dependencies on C++ code.

To clarify, this is from a build system perspective, not a language perspective, right? If possible I'd like to focus this thread at least on just the build system aspect and we can perhaps continue the language discussion over at https://github.com/rust-lang/rust-roadmap/issues/14 :)

It definitely makes sense to me that it's difficult to depend on C++ code in a build-system sense. Some of this I think is the granularity builds today (DAG vs crate) but in general I think it's just flat out unergonomic and difficult to plug in preexisting artifacts into a Cargo build.

We have an extensive distributed caching system for build artifacts which tries to use the cache to avoid as much build work as possible.

Definitely makes sense! I don't think it's out of the question though for Cargo to support custom caching though. In fact, with sccache we may get exactly this!

In general I'd like to keep an open mind to Cargo's current implementation today, and we can basically extend it in any way we see fit. For example Cargo's already got enough information to create a unique hash key for a crate and we could restructure it with custom caching to pull in artifacts on demand (or assume they're at a predetermined location) or something like that.

Not saying this is a silver bullet of course, but our options are still open!

Using workspaces allows those deps to be shared, but only with a very specific arrangement of directories. It doesn't scale to a large single source base with containing thousands of distinct projects, some of which may be rust, with an organizational scheme that's something other that their shared dependencies.

I'm not sure I quite understand this constraint, so I wonder if we could dig in a bit? I definitely agree that a workspace may not scale to thousands of crates and projects, but the idea of a Cargo.toml certainly should, right?

I guess I'm not fully understanding what's not scaling here. Are you thinking this is a fundamental compiler limitation? Or just something that needs working around in Cargo today? As with above, I'd like to keep in mind the possibility of changes to Cargo to make it more amenable to situations like this rather than assuming the functionality of today is impossible to change!

build.rs doesn't really work - building an executable then running it in the build infra is pretty awkward, and not well regarded

Yeah I can definitely understand how this may be nonstandard. I don't think this is something that can be sidestepped for too long, though, as a concept. Custom derive (macros 1.1) was stable in Rust 1.15, and that requires compiling a plugin a build-time to then run inside the compiler. I would draw such a practice as very similar to build.rs (in principle at least), and I'd expect that ergonomically using Rust will basically require using Serde in the near future (especially for communicating services).

In that sense, is it literally the build.rs with inputs/outputs itself? Or is it the concept of running code at build time that may cause problems? I'd definitely argue that macros 1.1 is more prinicpled than build.rs (defined set of inputs/outputs) but they naively to me at least don't seem fundamentally different.

Perhaps there's some scope for a mid-level tool that provides cargo's mechanisms, but not the UI, and a higher-level tool that presents a nicer UI/user experience when its used as the primary build mechanism.

I definitely agree! I do think that Cargo's too high level for Buck's use case today, and I personally feel that rustc will almost always be too "low level" to get real benefit. As we continue to add features to Rust, the compiler, and Cargo, the idiomatic and most ergonomic way to consume these features will be through Cargo. For example macros 1.1 might be an absolute nightmare if you had to manage all the builds yourself, especially when cross compiling.

I personally think that rustc sits at the right level of abstraction for Cargo to be calling, so we wouldn't want to soup it up too much. Taking Cargo down to be a bit lower level though I think is where there's a lot of benefit to be had. With that in place we'd then design new language features with such a tool in mind to ensure the experience is smooth for everyone, Cargo users and "this lower level Cargo" users alike.


@davidzchen thanks for the input about Bazel! I'm curious on your thoughts about my comments above related to build scripts as well. Do you agree that compiler plugins (e.g. macros 1.1) are along the same vein as build scripts, or is one much easier to support in Bazel than the other? Or are they both very difficult to support?

davidzchen commented 7 years ago

@alexcrichton Regarding compiler plugins and build scripts, the way I see it, the key difference between the two is that people write compiler plugins when they have a need to reuse functionality in the Rust compiler whereas people write build.rs to do anything they cannot do via the Cargo build system itself. As a result, compiler plugins are, in practice, used for much more niche use cases than build.rs.

One analogy that I can draw is that build.rs is similar to Bazel's genrule, which allows you to run any arbitrary shell command.

In any case, neither of these are difficult to support in Bazel:

A main concern that I have with relying too heavily on build.rs is that what it runs is arbitrary and is (often) potentially non-hermetic. As a result, for projects that have a mix of different languages with interop between Rust and other languages, it would be better to rely more on the build system for this and have more fine-grained build targets and limit the use of build.rs as much as possible.

jsgf commented 7 years ago

@alexcrichton:

To clarify, I'm under the impression that this is a solve problem today

Mechanically it's solved because --frozen will prevent any attempts to download, and in practice I handle the whole problem by prebuilding all the parts of crates.io that our code needs. Is there an option to reverse this, so that all downloads are prohibited by default, and only allowed if there's an explicit option or command? Might be useful if not.

To clarify, this is from a build system perspective, not a language perspective, right?

Yes. We have C++ libraries that have their own complex dependency graphs managed by Buck that I'd like to add a Rust FFI bindings for, and make sure that everything gets rebuild properly. I'd also like to be able to expose Rust libraries to C/C++ code (mostly for things like python extensions), and again, make sure the build system knows all the deps. Trying to manage dependencies across build systems seems like it could be awkward.

I don't think it's out of the question though for Cargo to support custom caching though

Do you mean Cargo might be able to make use of Buck's cache? That poses lots of problems, not least because its unclear how Cargo would be able to compute the correct key. Buck's cache is indexed by both the immediate dependency (the source file contents), but also the keys of all its dependencies, with the goal of being able to skip as much of the dependency graph as possible. Cargo wouldn't have access to the information needed to either lookup or insert blobs into the cache.

Effectively Buck treats the compiler as a pure function of inputs -> output, and memoizes the result. If the build tool is more complex than that, then Buck can't cache its state well, and it complicates the interface to the build tool if its doing its own caching/memoization.

I haven't looked a sccache in detail, but this is quite different from how something like ccache works; ccache caches the results of individual compiles, but doesn't take the dependency graph into account.

I'm not sure I quite understand this constraint, so I wonder if we could dig in a bit? I definitely agree that a workspace may not scale to thousands of crates and projects, but the idea of a Cargo.toml certainly should, right?

Yeah, I was being pretty unclear.

Without workspaces, every binary cargo package has a dependency graph on other packages with library crates, and building that executable builds them all as needed. If multiple binaries share some or all of the same crates, then they all get rebuilt regardless.

You can use workspaces to effectively share the dependencies between multiple binary crates so that the library crates only get built once. But to achieve this, all the crates - binary and library - must be in a single workspace.

There's a few issues with this:

I know there's been some discussion about loosening the constraints on workspaces to either allow nesting or have other relationships (esp with path dependencies), but its not clear to me they're the right way of modelling a complex dependency dag in a way to minimize building.

(Also I haven't really looked at workspaces in a while, so perhaps I'm completely out of date here, or just wrong.)

Yeah I can definitely understand how this may be nonstandard. I don't think this is something that can be sidestepped for too long, though, as a concept. Custom derive (macros 1.1) was stable in Rust 1.15, and that requires compiling a plugin a build-time to then run inside the compiler

I haven't looked at macros 1.1 yet, but are you saying that every crate that uses - say - serde will need to also build a compiler component, or is it just when serde itself is built? If its the latter then I can handle that when I pre-build the crates.io crates, and is basically no more difficult contraint than object files are compiler-version dependent.

If they need to be rebuilt for every user, then yeah, that's trickier.

The more general problem with build.rs is building an executable then running it as part of the build process. It's awkward to manage because it has unconstrained inputs and outputs (it can read and write arbitrary files) which means that it's opaque to the build system/dependency management.

Of the use cases listed in the docs, "Building a bundled C library", "Finding a C library on the host system" and "Performing any platform-specific configuration needed for the crate" are all pretty horrifying from a build integrity/reproducability perspective - they are strong antipatterns. The only one that makes any sense is "Generating a Rust module from a specification" (ie, generated source), but that could be done with a much more constrained interface (and perhaps macros 1.1 is that interface).

There's also the general security problem of just running random binaries on a build host that can do arbitrary things. It can be managed, but the less it happens the better.

I personally feel that rustc will almost always be too "low level" to get real benefit

rustc is about the right level for Buck, since its similar to gcc/javac/etc; certainly integrating at the rustc level (while not trivial) was more conceptually straightforward than trying to work out a conceptual mapping between buck and cargo.

What might be useful is:

Buck and Bazel are extremely similar in a lot of ways, so I expect that a solution that works for one will likely help with the other. As @davidzchen mentioned, the concept of a compiler plugin is very important for Java, so the Buck can also deal with it (since Java/Android is one of its primary use-cases); extending the concept to Rust is reasonably straightforward.

petrochenkov commented 7 years ago

This thread is delight to my eyes. All the stuff about Cargo's scalability and integration I wanted to talk about, but didn't have enough factual evidence and practical experience. @jsgf @davidzchen thanks a lot for details! Even if rustc+stdlib themselves are not especially large as a project, these issues show up at rustbuild level already, which favors "leverage Cargo and Rust as much as possible" over build system best practices. I hope this discussion will benefit it as well in the end.

luser commented 7 years ago

If we go with DAG-as-a-unit, is this sufficient? Can Cargo hook into existing caching infrastructure adequately? I believe this is how projects like Gecko work today where the whole Rust DAG is a unit and cargo is used to build it. This may have problems, however, if there are multiple Rust projects to link together (e.g. stylo and spidermonkey in Gecko both independently having Rust code)

In Gecko we are effectively limiting things to a single crate per output binary in the build system. We haven't crossed the "Spidermonkey requires Rust" bridge yet, but when we do we will probably just have it behind the existing "building the JS engine standalone" flag, and otherwise have that code pulled in via the crate that gets linked into libxul.

We've discussed this in other forums when we first ran into this issue, I know. The core problem was that when outputting something other than an rlib rustc includes support code such as jemalloc, and you can only link one copy of that into a binary.

luser commented 7 years ago

Mechanically it's solved because --frozen will prevent any attempts to download, and in practice I handle the whole problem by prebuilding all the parts of crates.io that our code needs. Is there an option to reverse this, so that all downloads are prohibited by default, and only allowed if there's an explicit option or command? Might be useful if not.

We've discussed this before, but no, there's not. I have a cargo issue open for making our Gecko use case nicer. For Gecko we currently vendor all our crates.io dependencies with cargo vendor, and use a cargo config file to enable source replacement.

I haven't looked a sccache in detail, but this is quite different from how something like ccache works; ccache caches the results of individual compiles, but doesn't take the dependency graph into account.

This isn't implemented yet (I'm working on it this quarter), but we're planning on making sccache able to cache rust compilations at the crate level. There's a good writeup of the plan here. This is different from ccache, which operates at the object file level, but Rust compilation is fundamentally different from C compilation.

jsgf commented 7 years ago

This is different from ccache, which operates at the object file level, but Rust compilation is fundamentally different from C compilation

Yes and no - you can roughly model a Rust compilation as a single crate == a single object file, where lib.rs effectively #includes the rest of the sources (this breaks down when talking about dependencies on other crates, and probably incremental compilation).

eddyb commented 7 years ago

@jsgf That is, however, the intention, crates are compilation units, from the user's point of view.

Internally we already have "codegen units" which are multiple translation units per (crate) compilation, and it's plausible we might have a "fusion" mode where the translation units are all triggered by one compilation unit (the "final executable" in an app, for example), instead of by each dependency.

So the object file analogy is imperfect if you think .o files literally, because how those are split can be up to the compiler (based on heuristics that you can't arise at in the build system), but crates are still the compilation units and the #include analogy is fine if you use the --emit=dep-info make-ish dependency set to know what the sources are and you pass --extern for dependency crates so those are precise too.

@luser You might want to talk to @nikomatsakis and/or @michaelwoerister (if you haven't already) about incremental recompilation, there's potentially a scenario where sccache can track the internal rustc incremental state, which would let you piggyback on its object file reuse, but I'm not sure it's worth doing.

Ericson2314 commented 7 years ago

@eddyb so incremental recompilation can serialize the exact codegen-unit-level dependencies it needs for a codegen unit, and then be resumed against just that, incemetal recompilation will work beautifully with (Nic's) import from derivation.

This is icing on the cake of everything I mentioned before.

codyps commented 7 years ago

The approach for integration taken in cargo-bitbake and used in meta-rust to handle including cargo packages within the bitbake/OpenEmbedded/Yocto build-system might be relevant to folks in this thread. I believe @cardoe also has written an integration for cargo packages in gentoo's portage build system.

Not quite the same as other build systems mentioned as these are both intended for generating pieces of linux distributions, but some of the goals (avoiding network access at build time & ensuring a stable build) are similar.

cardoe commented 7 years ago

Yes I have. In Gentoo we use cargo-build which behaves the same way that cargo-bitbake does.

We need the ability to fetch all the data necessary to perform the build before cargo executes since the package managers for both Yocto and Gentoo are responsible for verifying the integrity of the downloads and handling the downloads. The build process is done using dropped privileges where there is no network.

sholsapp commented 7 years ago

I'm not sure if there is any interest in this or not, but I've been working on Gradle plugins that use Cargo under the hood to build Rust code. The Rust plugins are largely inspired by Python plugins that I et al. wrote and open sourced last year (see engineering post and repository if you're interested in design). TL;DR: Gradle rocks in the enterprise, and makes plugging in new languages easy. We can do a good job using Gradle to just "orchestrate" the build, leaving all of the details to Cargo and Rust, thereby preserving idiomatic Rust development while integrating with large [Gradle] build systems.

If there is interest in seeing these Rust plugins, I can work on cleaning them up and open sourcing them. To date, they're mostly used for a few pet projects of mine.

stuhood commented 7 years ago

We need the ability to fetch all the data necessary to perform the build before cargo executes since the package managers for both Yocto and Gentoo are responsible for verifying the integrity of the downloads and handling the downloads. The build process is done using dropped privileges where there is no network.

^ This is a particularly clear distillation of the requirements for integration into build systems with existing caches and reproducibility guarantees (including the one used at Twitter).

alexcrichton commented 7 years ago

@davidzchen

ok cool, thanks for the info! I agree that totally arbitrary build.rs probably can't be supported, but I could imagine that for "actual build system integration" that (reasonable) restrictions are imposed on build scripts pulled in, perhaps described by metadata in Cargo.toml or something like that.

Good to know though that nothing is fundamentally unsupported!


@jsgf

Is there an option to reverse this, so that all downloads are prohibited by default, and only allowed if there's an explicit option or command? Might be useful if not.

Unfortunately not right now, but we could add a .cargo/config option to do so perhaps.

Effectively Buck treats the compiler as a pure function of inputs -> output, and memoizes the result.

Makes sense to me! Remember though that rustc/cargo are the same thing :). All the tools here are effectively pure functions, so what I'd like to figure out is the best way for rustc/cargo to fit in this model because it should mesh well!

That's a good point though about hashes taking dependencies into account. Cargo could do all it needs to do for Rust code, but it wouldn't be able to understand hashes on external C/C++ libraries used by Rust, however.

Also yeah, the intent of sccache is that it will look at all crates on the command line and use those as input for hashing.

If multiple binaries share some or all of the same crates, then they all get rebuilt regardless.

Indeed! I think this is a very surmountable problem, though. The first solution would be a workspace, but I agree that thousands of members probably isn't great for a workspace. Other solutions could include a shared output directory or simply a better caching solution. For example the intention with sccache is that it's a drop in for rustc and will automatically pull in caches for everything.

In that sense I don't think there's a fundamental blocker here, but it sounds like integrating into existing caching solutions is a high priority. That way as long as our dependency resolution/granularity is accurate then we'll get cached builds for free (no matter where you are in a tree).

If they need to be rebuilt for every user, then yeah, that's trickier.

Oh no so for serde at least what will happen is that a compiler plugin will be compiled, and then that same plugin is used for all crates. You wouldn't have to recompile a plugin for all downstream crates.

I also agree that completely unconstrained build scripts pose problems, but in practice this is never the case. I could imagine that a vendoring policy would require "well behaved" build scripts to be added to the repo only, for whatever definition is desired. This is sort of along the lines of what I mentioned with @davidzchen above.

I'm personally trying to push on including build.rs as much as possible because in many places it's quite integral to assembling crates. Losing out completely on build.rs I feel is untenable in terms of "acceptably being able to leverage crates.io", but imposing reasonable restrictions on build scripts (e.g. don't download things) seems totally plausible.

I agree that lots of the native library-geared build scripts probably don't need to be run, but they should all have proper escape hatches to use what's already on the system. Crates like openssl-sys need to run to detect what version of OpenSSL they're compiling against, and ignoring all crates on crates.io that use openssl-sys transitively would be a bummer!

rustc is about the right level for Buck, since its similar to gcc/javac/etc; certainly integrating at the rustc level (while not trivial) was more conceptually straightforward than trying to work out a conceptual mapping between buck and cargo.

Perhaps yeah, but my point is that rustc is way too low level for Rust and crates.io. I definitely agree that it's more difficult to map between buck and cargo, but I feel that the benefits are definitely worth it.

More generally, I originally commented:

First and foremost we've historically concluded that build system integration is not finished until you've got access to crates.io crates

Do you agree with this? Or do you think that the integration you've got today is suitable in terms of leveraging the existing Rust ecosystem?


@luser

The core problem was that when outputting something other than an rlib rustc includes support code such as jemalloc, and you can only link one copy of that into a binary.

I might rephrase the core problem as being libstd, but in general yeah it's because a staticlib includes all dependencies, and then dependencies duplicated across crate graphs will be included twice if two staticlibs are linked.


@cardoe

We need the ability to fetch all the data necessary to perform the build before cargo executes since the package managers for both Yocto and Gentoo are responsible for verifying the integrity of the downloads and handling the downloads.

Does this not work today? I'm under the impression that Cargo has enough features for this, but I'd just want to confirm that's the case!


@stuhood

This is a particularly clear distillation of the requirements for integration into build systems with existing caches and reproducibility guarantees

To reiterate some points from earlier:

To that end, are there active points of further integration needed?

jsgf commented 7 years ago

@alexcrichton -

Thanks for the detailed response!

Makes sense to me! Remember though that rustc/cargo are the same thing :). All the tools here are effectively pure functions, so what I'd like to figure out is the best way for rustc/cargo to fit in this model because it should mesh well!

If cargo were simply a build tool that transforms inputs -> output, then I could simply invoke it instead of rustc. But it isn't simply that - it's also doing dependency management, making its own decisions when to build things, resolving dependencies in Cargo.lock (even creating Cargo.lock!).

The traditional failures of build systems can mostly be summarized as "no single source of truth". For example, "make" works fine on simple projects, but as soon as you have significant amounts of procedural code in a build system, you end up with problems with flakey builds because the Makefiles no longer fully describe the project's dependencies. Even if there's no explicit procedural code, even using nested Makefile/make invocations have the same problem, because the dependency information is scattered over multiple Makefiles.

Likewise, without build.rs, cargo is in a fairly good state as a stand-alone tool, but with build.rs it loses a large amount of control, as evidenced by its very conservative/heavy-handed treatments of build.rs (rebuilds eagerly, or the requirement to periodically cargo clean to make sure the build.rs does get rerun).

Builds systems like Buck (and Blaze/Bazel, though I haven't used them) attempt to resolve these problems by constructing a single source of truth dependency graph which includes everything, and by avoiding procedural build steps as possible (unless they can be modelled in the dependency graph).

So if Cargo is going to continue doing the things as it does today, then embedding it in another build system is inherently tricky - it amounts to having two sources of truth for dependency information, with the associated risks of getting them out of sync. Either Buck has to (somehow) delegate parts of its dependency management to Cargo, or has to maintain its own independent model of what Cargo is doing internally. That might be possible if the Cargo portion is on the edge of the graph, but becomes much harder if its embedded in the middle (ie, Cargo has dependencies on Buck-managed targets).

First and foremost we've historically concluded that build system integration is not finished until you've got access to crates.io crates

Do you agree with this? Or do you think that the integration you've got today is suitable in terms of leveraging the existing Rust ecosystem?

Having access to the crates.io is essential to being productive in Rust, and the better integrated it is - esp having low friction in using new crates - the more productive you can be.

But more broadly, that's also true of all the other open source in other language ecosystems. Their typical deployment is "unpack tar, run ./configure, build", which is completely incompatible with the Buck way of doing things. As a result, the options are 1) rewrite their build systems in Buck, or 2) special-case them by prebuilding with their preferred build system, then use that prebuilt artifact in Buck. 1) is impractical, so 2) is the answer.

Cargo - esp with the presence of build.rs - is effectively the same as "unpack and run configure", and so I'm using the same solution: run cargo in its own isolated environment with all that crate's dependencies vendored, then use the prebuilt .rlib (and soon, .so) in Buck. Unfortunately this process introduces quite a bit of friction, but it does have its upsides (if an upstream package like openssl gets updated, then others can automatically rebuild my Rust code to use it without me being involved).

The things that @davidzchen mentioned about having Bazel directly parse Cargo toml/lock files to download deps sounds very interesting, but I can't see it working so long as a build script is involved; if the package depends on build.rs, it has to be special-cased.

Its not really practical to manually audit build.rs files to see if they're "well behaved", esp since it would have to be redone every version, or they may depend on arbitrary complexity like invoking cmake/autoconf/etc. The only thing I can see working is if cargo has some way to sandbox build scripts so their actions can be well-defined in advance, and/or run in a way that shows exactly what their effects will be without making any changes - but I think that's out of scope for cargo.

<handwave> What I could imagine being possible is if build.rs stops being general purpose code, and instead becomes something that's higher-level and more declarative. Rather than having code which invokes autoconf or probes around for a library, it should simply emit an event/request along the lines of "need external dependency X". For standalone use, that would be paired with something that simply uses the copy of X that's packaged with the crate and just build it in place, as happens now. But if cargo is integrated with something else, then that can be turned into something like "emit buck dependency on third-party package X" (and ideally this could be done without compiling and running arbitrary code).

Perhaps this could be done as an extension of the Cargo.toml [dependencies] section, so you can express dependencies on non-Rust code there, and express a locally packaged version via a workspace. </handwave>

I'm personally trying to push on including build.rs as much as possible because in many places it's quite integral to assembling crates. Losing out completely on build.rs I feel is untenable in terms of "acceptably being able to leverage crates.io", but imposing reasonable restrictions on build scripts (e.g. don't download things) seems totally plausible.

As I've mentioned before, I think most of the uses of build.rs are incompatible with a large build system. The limitations need to be more like:

Really the only acceptable thing a build script can do of its current set of roles is generate sources from well-defined inputs, and that could be done with a much narrower interface.

I think another way of phrasing my handwave above is that a cargo package can request dependencies, but it must use whatever it's given as a result of that request. A possible implementation of that request might be to probe and build a local version (ie, the current mode of operation), but other implementations must be possible.

In that sense I don't think there's a fundamental blocker here, but it sounds like integrating into existing caching solutions is a high priority. That way as long as our dependency resolution/granularity is accurate then we'll get cached builds for free (no matter where you are in a tree).

I'm curious to know your thoughts about what this interface would look like in a bit more detail.

jsgf commented 7 years ago

<more handwave> More generally, you could consider splitting cargo into several distinct parts:

The planner would build up a graph of actions ("I need to build X from A, B, C because Y needs it"). In the normal (current) mode of operation the build execution engine would walk the graph and perform each action, possibly relying on cached state.

However, that action graph could also be turned into a set of rules for another build system (Buck, Bazel, etc) to perform the execution, including managing its own cache.

Crate dependency management spans both to some extent - if you have a dependency on a crate, then you can take an action like download it or check vendored sources, then embed that crate's action graph into this one (sharing any common subgraphs).

Dependencies on non-Rust code could also be handled in the action graph, where the execution engine is responsible for resolving things like "I need openssl". In the current standalone cargo mode, this would still be "invoke autoconf from build.rs", but it could be implemented as "depend on the standard 3rd party openssl".

(There's a ton of missing detail here.)

sfackler commented 7 years ago

How does this system work with other build systems like Maven, Gradle, CMake, NPM, etc?

Ericson2314 commented 7 years ago

Note it's not necessary for Cargo to learn to spit out the graph/plan in every format. Just give us something that Cargo will rigorously follow (no leaky state), and we'll happily convert it to buck/bazel/nix/whatever.

Crate dependency management spans both to some extent

Eh, it's not that bad. As long as Cargo spits out what needs to be downloaded, and can be made to fail rather than redownload if its not fed what it wants, we're good.

Dependencies on non-Rust code could also be handled in the action graph

There was a proposal for declarative pkg-config I think? It would be easy to just forward this stuff into the plan and other tools to do what they will with it.

(There's a ton of missing detail here.)

I don't think so actually. It's a lot of implementation work, but pretty straightforward conceptually.

Ericson2314 commented 7 years ago

@sfackler Well if all the language-specific ones could spit out a plan and stick to it, it would be very easy to splice them together for any "dumb graph executor". Unfortunately, none of them do ATM.

jsgf commented 7 years ago

Note it's not necessary for Cargo to learn to spit out the graph/plan in every format. Just give us something that Cargo will rigorously follow (no leaky state), and we'll happily convert it to buck/bazel/nix/whatever.

Yes, exactly - cargo can already be convinced to emit a fair amount of metadata as json blobs; I was thinking of this as a continuation of that practice.

Eh, it's not that bad. As long as Cargo spits out what needs to be downloaded, and can be made to fail rather than redownload if its not fed what it wants, we're good.

So that means its the execution engine that's composing the action graphs, rather than relying on cargo do it?

Ericson2314 commented 7 years ago

@jsgf sure, I'm perfectly happy to take on the burden of transforming the plan --- much better than doing a partial reimplementation of the whole tool as us Nixers have done for Haskell's cabal-install!

sjmackenzie commented 7 years ago

@jsgf has some really good signal

So if Cargo is going to continue doing the things as it does today, then embedding it in another build system is inherently tricky - it amounts to having two sources of truth for dependency information, with the associated risks of getting them out of sync.

This is the reason why we dropped cargo and built nixcrates. Nix is declarative and is able to track sources beyond just crates into the OS level, hence adequately has one system view or state snapshot. Though every time we see crates with funky bash scripts and rust calling executables those crates definitely fail.

In my opinion, the right mid level is just a simple program that given a crate name/s as input generates a list of transitive deps are returned along with the download URL, the checksum etc. Other tools will be able to leverage that very well.

Ericson2314 commented 7 years ago

@sjmackenzie Well build plan solving is crucial. Us Nixers tend to lock things down anyways, but it's great to be able to automatically bump the plan when we want something new rather than fiddle with exact versions manually.

sjmackenzie commented 7 years ago

@Ericson2314 oh sure mate (you're preaching to the choir). Since we move to nixcrates most of our problems went away (but replaced with new problems :-) ). Now the challenge is to whittle down and massage those crates that don't build properly for whatever reason. From this line downwards https://github.com/fractalide/nixcrates/blob/master/default.nix#L112 everything that's in buildInputs isn't building for different reasons. This list was compiled by manually compiling the top 2500 top downloaded crates. They are first order fails, so many more dependent crates don't build due to these crates failing.

jsgf commented 7 years ago

@sfackler -

How does this system work with other build systems like Maven, Gradle, CMake, NPM, etc?

CMake is the only one of those I have any experience in, and it's also a "rule generator" like I'm proposing rather than something that actually executes builds. I suppose you could come up with some way of passing the cargo action graph through to the underlying build engine, but I think a preferred course of action is to burn cmake to the ground, bury the remains in a pit and cover it all in concrete and holy water.

aturon commented 7 years ago

@jsgf

More generally, you could consider splitting cargo into several distinct parts:

  • crate dependency management
  • a build planner
  • a build execution engine

I think you've cracked the nut!

I was talking with @alexcrichton about build system integration this morning, and we came to the exact same conclusion. The important role of Cargo is the first two bullets; that's the way that it shapes the Rust ecosystem. Actually executing the build, OTOH, is something that should be easily delegateable to other systems.

I agree wholeheartedly with your proposed strategy of using Cargo for build planning, and then exporting the plan for consumption by other tools. I also think the idea of representing things like C dependencies in a more "first-class" way (rather than encoding them through build scripts) will likely make everyone's life better.

As you say, there are a lot of details to be fleshed out here, but I think this is a very promising avenue for tackling this roadmap item!

I did want to raise one other question, though. While we clearly want and need to use Cargo for incorporating the crates.io ecosystem, does it have a role to play when working on internal projects? With your breakdown, we can ask more specifically whether there is useful dependency management and build planning work to be done for internal projects.

I know a lot of organizations use mono-repos internally, and have everyone working with a single version of all libraries -- internal or external. In that world, resolving Cargo.toml files into a dependency graph is perhaps not so useful, since there's only one possible version to use. (It might still be more convenient than writing the Buck rules directly, though). But there is at least one major downside to this approach: for the crates.io ecosystem, it means manually finding a subset of the ecosystem containing all the dependencies you want, while agreeing across the board on a single version of each. It's quite plausible that such a solution doesn't even exist!

OTOH, if you wanted to (say) allow multiple major versions of crates coming from crates.io, then there's a pretty strong reason to use Cargo even for internal projects as a dependency resolver and build planner. The Buck/whatever rules could then be auto-generated.

Using Cargo everywhere as the build planner for Rust gives you a Rust experience that's consistent with what you see in the docs and across the wider ecosystem. OTOH, people coming to Rust code in your organization who are used to writing Buck files directly may find it annoying to have to learn another tool to generate those files.

So in short: given the above, with the rough plan you sketched, do you imagine using Cargo for internal projects and exporting to Buck? How do you see the tradeoffs?

jsgf commented 7 years ago

Using Cargo everywhere as the build planner for Rust gives you a Rust experience that's consistent with what you see in the docs and across the wider ecosystem. OTOH, people coming to Rust code in your organization who are used to writing Buck files directly may find it annoying to have to learn another tool to generate those files. So in short: given the above, with the rough plan you sketched, do you imagine using Cargo for internal projects and exporting to Buck? How do you see the tradeoffs?

TBH, I don't think that's likely at all. I'm envisaging using cargo's version resolution engine, but no Cargo.toml files committed to the main source base - instead, dependencies on crates.io (or cargo-managed in general) packages would be encoded as Buck rules. Cargo.{toml,lock} files would only be manifest as user-visible artifacts when doing things like exporting to open source projects.

Version management is tricky, and coming up with a unification between the Cargo model and a mono-version monorepo is not completely clear. Right now I'm prebuilding all the "supported" crates.io packages, and doing a single cargo version resolution over all of them to try and end up with a minimal number of versions for each package. However, only an explicitly enumerated subset of those packages are exported to the internal codebase, and there's only ever a single version of those. In general I use "*" as a version specifier unless a package has some breaking API change that's too hard to fix up on the fly.

With tighter integration between buck and cargo, I'm imagining a rust_cargo_library rule, but I'm not clear exactly how it would work yet (ie, how to specify versions, where the resolution from a version spec to a specific version happens, where does downloading happen, etc).