rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.75k stars 2.42k forks source link

cargo build during a docker build takes forever and blows up memory on "Updating crates.io index" #10781

Closed lexicalunit closed 2 years ago

lexicalunit commented 2 years ago

Problem

In my Dockerfile I have this:

FROM python:3.9

RUN echo 'APT::Install-Recommends "false";' > /etc/apt/apt.conf.d/99no-install-recommends
RUN apt-get update \
    && apt-get install -y --no-install-recommends git curl \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
RUN git clone https://github.com/twilight-rs/http-proxy.git
RUN cd http-proxy \
    && . $HOME/.cargo/env \
    && cargo build --release

And just this causes building the image to blow up on the cargo build --release command. It looks like the process gets stuck on the "Updating crates.io index" step forever until the docker container runs out of memory and kills it with exit code 137.

I had to up my docker's vm to 20 GB just to get the command to execute properly. Building this library outside of docker works just fine, takes up almost no memory, and executes in mere seconds.

Steps

docker buildx build --platform linux/amd64 .

Possible Solution(s)

There must be some kind of networking configuration or bypass that can be done here? Why on earth does this work fine on my local machine, but blow up inside of docker build?

Notes

No response

Version

cargo 1.61.0 (a028ae4 2022-04-29)
release: 1.61.0
commit-hash: a028ae42fc1376571de836be702e840ca8e060c2
commit-date: 2022-04-29
host: aarch64-unknown-linux-gnu
libgit2: 1.4.2 (sys:0.14.2 vendored)
libcurl: 7.80.0-DEV (sys:0.4.51+curl-7.80.0 vendored ssl:OpenSSL/1.1.1m)
os: OracleLinux 11.0.0 [64-bit]
lexicalunit commented 2 years ago

Note that I also see a bajillion log lines in the build log saying:

<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
keenanjohnson commented 2 years ago

I have seen this same behavior on arm64 docker image builds using docker buildx and qemu. Not sure what a valid workaround is.

keenanjohnson commented 2 years ago

https://github.com/keenanjohnson/ros2_rust_workspace/issues/21

lexicalunit commented 2 years ago

Yep, definitely seems like a very similar, if not the same issue! In my case I'm wondering if a workaround could be pre-building/vendoring the cargo library and just copying it into the docker image 🤷🏻‍♀️

epage commented 2 years ago

Updating crates.io index"

On your machine, it is incrementally updating the registry repo. On a docker image, it is doing that from scratch. If it is the act of pulling the registry that is causing things to blow up, you might be interested in https://blog.rust-lang.org/2022/06/22/sparse-registry-testing.html which causes us to only download exactly what is needed.

lexicalunit commented 2 years ago

@epage You are an absolute gem!

Tested and working flawlessly. Diff:

-RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain nightly
 RUN git clone https://github.com/twilight-rs/http-proxy.git
 RUN cd http-proxy \
     && . $HOME/.cargo/env \
-    && cargo +nightly build --release
+    && cargo +nightly build --release -Z sparse-registry

I also noticed an improvement in the build process by doing:

DOCKER_BUILDKIT=0 docker buildx build --ulimit nofile=1024000:1024000 --platform linux/amd64 .
lexicalunit commented 2 years ago

I do still get the jemalloc warnings and the actual build process is pretty slow (maybe because of these warnings?)... but at least that's totally resolved the memory explosion issue.

keenanjohnson commented 2 years ago

Thanks for the great new feature! I'll try out that sparse nightly update and see if it resolves my issue as well!

keenanjohnson commented 2 years ago

Sorry I'm a bit new to the nightly builds of cargo, but when I try the docker commands below, I get a /bin/sh: 1: cargo: not found error

RUN RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain nightly
ENV PATH=/root/.cargo/bin:$PATH

# Install the cargo-ament-build plugin
RUN cargo +nightly install -Z sparse-registry --debug cargo-ament-build 

What thing am I obviously missing and thank you for helping a newbie.

keenanjohnson commented 2 years ago

Nvm I was able to figure it out via the following:

# Install Rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- --default-toolchain 1.61.0 -y
ENV PATH=/root/.cargo/bin:$PATH

# Switch Rust to the nightly bui
RUN rustup toolchain install nightly

# Install the cargo-ament-build plugin
RUN cargo +nightly install -Z sparse-registry --debug cargo-ament-build 

Note this feature is awesome and has fixed a long-standing problem with cargo on arm64 for me! I'll leave some feedback in the forum thread, but I hope to see this get mainline @epage !

Sorry for hi-jacking the issue here @lexicalunit !

ehuss commented 2 years ago

I'm a bit confused about the setup, can you clarify a few things?

You mention running on macOS? I'm guessing this is an aarch64 (apple silicon) machine? If so, is there a particular reason you are running your docker image with amd64? My understanding is that the qemu emulation with Docker in that setup is likely not to work very well. Is it possible to use an aarch64 linux image instead?

lexicalunit commented 2 years ago

I'm deploying to a linux/amd64 system, but my personal laptop where I'm deploying from is a M1 mac.

ehuss commented 2 years ago

If you are just using docker to build something that is then deployed elsewhere, then I might suggest using an aarch64 linux image, and cross-compiling to x86_64. If I understand how Docker on macOS M1 works, that will not run under emulation. I believe it shouldn't be too difficult to install all the requisite cross-toolchain stuff in Debian, though I'm not sure.

lexicalunit commented 2 years ago

@ehuss I don't understand what you mean. I'm already selecting the platform with --platform linux/amd64. My base image is python:3.9 (see https://hub.docker.com/_/python). Are you suggesting I modify my FROM python:3.9 to FROM --platform=linux/amd64 python:3.9? Does that even do anything at all? My assumption would be that this has the same effect as including --platform linux/amd64 in the command line.

I also tested this suggestion and it seems to have no effect.

ehuss commented 2 years ago

I'm not familiar with Docker on M1, but I would guess that using --platform arm64v8 or FROM arm64v8/python:3 would fetch the aarch64 image. I don't know if those will run natively, but I would give it a try. Then, inside that arm image, use cargo build --target x86_64-unknown-linux-gnu to cross-compile to amd64 (after installing the requisite cross toolchain stuff).

lexicalunit commented 2 years ago

So on macOS the Docker VM is already running aarch64 Linux AFAIK:

$ nc -U ~/Library/Containers/com.docker.docker/Data/debug-shell.sock .
/ # uname -a
Linux docker-desktop 5.10.76-linuxkit #1 SMP PREEMPT Mon Nov 8 11:22:26 UTC 2021 aarch64 Linux

The only want I know of to specify the platform in the FROM statement of a Dockerfile is via adding --platform=xxx to it.

It's just that I'm running the docker CLI from a M1 terminal, so I need to specify linux/amd64 since that is no longer the default in that context.

I wouldn't want to do --platform arm64v8 because then I'd be building a image with Python built for arm64v8. I need to deploy to a amd64 machine.

ehuss commented 2 years ago

I wouldn't want to do --platform arm64v8 because then I'd be building a image with Python built for arm64v8. I need to deploy to a amd64 machine.

My suggestion is to use the --target flag to target x86_64 using cross-compilation. Most Linux package managers support installing packages to help with cross-compiling. It's not clear if you have other requirements that make that difficult. Overall I would suggest avoiding qemu if at all possible.

I believe this is a duplicate of #10583, so closing in favor of that.

lexicalunit commented 2 years ago

I believe the --target option is to target a build stage, not as a pass-thru to compiler flags:

$ docker build --help

Usage:  docker build [OPTIONS] PATH | URL | -

Build an image from a Dockerfile

Options:
      [...]
      --platform string         Set platform if server is multi-platform capable
      [...]
      --target string           Set the target build stage to build.

Unless you mean that I should pass it thru to the cargo build command, like so: cargo +nightly build --target x86_64 [...]? In that case it causes an unrecoverable exception to occur:

 => ERROR [ 6/17] RUN cd http-proxy     && . $HOME/.cargo/env     && cargo +nightly build --target x86_64 --release -Z sparse-registry                                                                                                                                                                                                                                          2.3s
------
 > [ 6/17] RUN cd http-proxy     && . $HOME/.cargo/env     && cargo +nightly build --target x86_64 --release -Z sparse-registry:
#10 2.242 error: failed to run `rustc` to learn about target-specific information
#10 2.242
#10 2.242 Caused by:
#10 2.243   process didn't exit successfully: `rustc - --crate-name ___ --print=file-names --target x86_64 --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit status: 1)
#10 2.243   --- stderr
#10 2.243   <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
#10 2.243   <jemalloc>: (This is the expected behaviour if you are running under QEMU)
#10 2.243   error: Error loading target specification: Could not find specification for target "x86_64". Run `rustc --print target-list` for a list of built-in targets
#10 2.243
------
error: failed to solve: executor failed running [/bin/sh -c cd http-proxy     && . $HOME/.cargo/env     && cargo +nightly build --target x86_64 --release -Z sparse-registry]: exit code: 101
Jasperav commented 1 year ago

I do still get the jemalloc warnings and the actual build process is pretty slow (maybe because of these warnings?)... but at least that's totally resolved the memory explosion issue.

@lexicalunit you solved this? The build takes extremely long.

lexicalunit commented 1 year ago

@Jasperav I think the big issue I was encountering was that cross-compilation from ARM to x64 was just slower than building for x64 on an x64 system. I now just do my docker builds and deployments from a x64 mac and it's much faster.

Jasperav commented 1 year ago

@lexicalunit thanks for the headsup, I get all kind of different errors while cross compiling :( well guess I need to get my windows system for deployments

simgt commented 1 year ago

In case someone stumbles on this issue and is not satisfied with the suggestion of using nightly, this answer solved it for me: https://users.rust-lang.org/t/cargo-uses-too-much-memory-being-run-in-qemu/76531/2

➡️ Add --config net.git-fetch-with-cli=true to your cargo build command.

marcellodesales commented 1 year ago

@simgt Just make sure if you are using a base image without git, the command will fail:

 => ERROR [builder  9/11] RUN cargo build --config net.git-fetch-with-cli=true --target x86_64-unknown-linux-musl --release                                           2.3s
------
 > [builder  9/11] RUN cargo build --config net.git-fetch-with-cli=true --target x86_64-unknown-linux-musl --release:
#0 1.806     Updating crates.io index
#0 2.291 error: Unable to update registry `crates-io`
#0 2.291
#0 2.291 Caused by:
#0 2.291   failed to fetch `https://github.com/rust-lang/crates.io-index`
#0 2.291
#0 2.291 Caused by:
#0 2.291   could not execute process `git fetch --force --update-head-ok 'https://github.com/rust-lang/crates.io-index' '+HEAD:refs/remotes/origin/HEAD'` (never executed)
#0 2.291
#0 2.291 Caused by:
#0 2.292   No such file or directory (os error 2)
------
WARNING: No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
Dockerfile:22
--------------------
  20 |     # This is a dummy build to get the dependencies cached.
  21 |     #
  22 | >>> RUN cargo build --config net.git-fetch-with-cli=true --target x86_64-unknown-linux-musl --release
  23 |
  24 |     # Now copy in the rest of the sources
--------------------
error: failed to solve: process "/bin/sh -c cargo build --config net.git-fetch-with-cli=true --target x86_64-unknown-linux-musl --release" did not complete successfully: exit code: 101
$ docker run -ti rustlang/rust:nightly-buster-slim git --version
Unable to find image 'rustlang/rust:nightly-buster-slim' locally
nightly-buster-slim: Pulling from rustlang/rust
ebcd4e3db076: Pull complete
c63476507cda: Pull complete
Digest: sha256:609c65daad3c69f9a37717e45d794e2eab99ad488dc5d492b8fc85c97c1df531
Status: Downloaded newer image for rustlang/rust:nightly-buster-slim
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "git": executable file not found in $PATH: unknown.
ERRO[0044] error waiting for container: context canceled

Fix in this case

RUN apt-get update && apt-get install -y git
RUN cargo build --config net.git-fetch-with-cli=true --target x86_64-unknown-linux-musl --release
seungha-kim commented 1 year ago

As of Rust 1.68, stable toolchain supports sparse protocol, so you don't need to use nightly.

https://blog.rust-lang.org/inside-rust/2023/01/30/cargo-sparse-protocol.html