mozilla / sccache

Sccache is a ccache-like tool. It is used as a compiler wrapper and avoids compilation when possible. Sccache has the capability to utilize caching in remote storage environments, including various cloud storage options, or alternatively, in local storage.
Apache License 2.0
5.74k stars 542 forks source link

Add support for ARM chips (e.g. raspberry pi) at least as scheduler (allow it to compile sccache-dist) #656

Open dholbert opened 4 years ago

dholbert commented 4 years ago

tl;dr: I'd like to be able to use a raspberry pi as my sccache-dist scheduler node, and I have beefy machines which may come online or disappear as build nodes (and I don't want to depend on any particular beefy node having to serve double-duty as the scheduler, ideally).

However, I can't do this right now, because the sccache-dist binary won't build/install on my raspberry pi, because the sccache-dist "main" function is guarded to be x86_64-only:

// Only supported on x86_64 Linux machines
#[cfg(all(target_os = "linux", target_arch = "x86_64"))]
fn main() {

https://github.com/mozilla/sccache/blob/e6c78c98baedbf67392d882a56065f3f415328d9/src/bin/sccache-dist/main.rs#L95

Is there a reason we're placing this restriction? I can imagine it being a useful restriction to place for build nodes, but it doesn't seem necessary for the scheduler (intuitively at least -- maybe there are technical reasons I'm not aware of).

MORE DETAILS: My raspberry pi has "ARMv7 Processor rev 4 (v7l)" according to /proc/cpuinfo, and I'm attempting to build (and failing) with this command:

cargo install -vv --bins --features="dist-server"  -f \
  --git https://github.com/mozilla/sccache.git \
  --rev e59cc28cf37254e182ede79070384f8166b52f8e

And the error message is:

error[E0601]: `main` function not found in crate `sccache_dist`
   --> /home/pi/.cargo/git/checkouts/sccache-88b18c6d184a8bdc/e59cc28/src/bin/sccache-dist/main.rs:1:1

...which makes sense, given the x86_64 guard on fn main quoted above.

dholbert commented 4 years ago

Note:

luser commented 4 years ago

Probably the only reason this is written this way is because the scheduler and the build server get built into the same binary and the build server is only written to run on x86-64 Linux.

They're both behind the same dist-server feature flag currently: https://github.com/mozilla/sccache/blob/24f4306fad03ee77091d3cdd1520b78e9365076a/Cargo.toml#L140

I suppose you could split that out into like dist-scheduler and dist-builder features (and have dist-server = ["dist-scheduler", "dist-builder"] to preserve the current state of affairs if desired) and compile them each in individually.

All that said technically there's no real reason the build server can't run on other Linux architectures, it's just that in practice the code makes a lot of assumptions about what kinds of binaries it can run.

froydnj commented 4 years ago

I would not object to separating out the scheduler and the builder code into different binaries. I think that makes more sense than adding more features to the #[cfg]-laden source we have now.

devyn commented 3 years ago

Personally I would also like to run the build servers on ARM. I was surprised that sccache-dist is x86_64 only.

Perhaps it would make sense to:

luser commented 3 years ago

Personally I would also like to run the build servers on ARM

During the initial design discussions around the distributed compilation work that I had with @aidanhs, I remember expressing a desire to eventually include support for Rust cross-compilation in a way that was simpler than the current support for overriding toolchains. I don't recall exactly what we discussed, but I think I suggested that making sccache aware of rustup would provide a straightforward way to do this. If a client is using a standard Rust toolchain via rustup then build servers could simply rustup install the same toolchain + whatever target the client is using. With that in place, an x86-64 client ought to be able to distribute compilation to an aarch64 build server and have it produce the same binaries by cross-compiling.

(Obviously supporting C++ compilation transparently like that would be a different ball of wax.)

kov commented 2 years ago

Why is it that build servers are supposed to be used only with x86-64? Is there any low level limitation or is it just the fact that the architecture is hardcoded in the triplets / paths? I was looking forward to using sccache-dist on my M1 laptop and use an M1 Mini to offload some of the build to.

luser commented 2 years ago

Why is it that build servers are supposed to be used only with x86-64? Is there any low level limitation or is it just the fact that the architecture is hardcoded in the triplets / paths?

I don't recall the exact reasoning for only supporting x86-64, except that making everything architecture-aware would be additional work and we probably just deemed it out of scope at the time. Currently the protocol has no notion of the CPU or OS of the build servers or clients, so at the very least it would need some way to keep track of that so it would only allocate build jobs to servers with matching architectures.

I was looking forward to using sccache-dist on my M1 laptop and use an M1 Mini to offload some of the build to.

Running a build server on macOS is additionally out of scope here because the build servers run all jobs in a sandbox, and the two choices for sandboxing in the code currently are bubblewrap or Docker, neither of which work natively on macOS. A non-sandboxed build mode is not something we'd be interested in having, since it's too big of a gaping security hole.

kov commented 2 years ago

@luser I want to run those on Linux, actually. I have Linux running natively on the Mini (with the work done by Asahi), and on a VM on the laptop. But good to know, it's a matter of adding architecture awareness to the protocol then (and I guess I can just patch the code locally as a quick fix). Thank you!

gyscos commented 7 months ago

@luser

so it would only allocate build jobs to servers with matching architectures.

Is that even a desired limitation? Could the build farm not cross-compile for whatever target triplet is requested, so that both x86_64 and aarch64 (and maybe others?) could compile for any target (including armv7, like the raspberry pi OP wanted to use as scheduler)? It would greatly benefit low-powered ARM devices to be able to rely on a large farm (possibly x86_64). This seems to be what you mentioned in the pose just before.

Or maybe by "with matching architectures" you mean "with toolchain installed for the requested architecture"?

luser commented 7 months ago

Is that even a desired limitation?

The distributed compilation functionality of sccache is modeled directly on that of icecream, but with some additional security features. Toolchains are provided by the client requesting compilation, but sccache doesn't have any knowledge of the capabilities of the toolchain. The interest of sccache is producing the result of a compilation as if the compiler execution it is wrapping were run directly. Cross-compilation is a complex topic. Making sccache aware of the targets a toolchain can compile is one thing, but making it able to take a local toolchain, e.g. x86_64-unknown-linux-gnu compiler binaries targeting x86_64-unknown-linux-gnu outputs and map that to another toolchain, e.g. aarch64-unknown-linux-gnu compiler binaries targeting x86_64-unknown-linux-gnu outputs, with some degree of confidence is tricky.

sccache does currently support very simple custom toolchain functionality, designed to allow distributing compilation from macOS and Windows clients by using manually-created cross-compiler packages that are substituted using simple matching on the client-side path to the compiler. One could imagine extending this to allow noting the host architecture of a custom toolchain. If the servers were extended so that build servers could report their native architecture, the dist server maintained pools of build servers separated by architecture, clients included their native architecture, and the dist server allocated build jobs to matching builders, then this could plausibly work.

As you may be able to guess from the length of the preceeding paragraph, this is a nontrivial amount of work and will require changes to the protocols that sccache uses to communicate between the various components in distributed compilation mode, as well as changes to every component involved.

Prior to any of that work being done I would want to see a concrete list of desired use cases and at least a high-level overview of the proposed design and how it would serve those use cases. It's complicated enough that I don't feel confident that any given design will meet the needs of the folks that are asking for it unless proven otherwise.