rust-lang / rustup

The Rust toolchain installer
https://rust-lang.github.io/rustup/
Apache License 2.0
6.16k stars 888 forks source link

`rustup-init.sh` fails to detect platform correctly under `docker buildx` which lacks `/proc` #2700

Closed miigotu closed 4 months ago

miigotu commented 3 years ago

rustup-init.sh installs the incorrect rustc and other binaries because of a failure to detect arch. Problems: /proc/self/exe does not exist during docker build, so i686/386 etc is detected incorrectly as x86_64 due to failure on line 153. mips64 likely suffers the same issue because it also uses get_bitness

grep '^Features' /proc/cpuinfo | grep -q -v neon fails and ARMv6 is incorrectly detected as arm7 on line 367 Running the downloaded binary fails with /lib/ld-linux-armhf.so.3: No such file or directory (because it isnt armhf, it is armel at /lib/ld-linux.so.3 )

Logs and example code to produce the container and error: https://gist.github.com/miigotu/2a0b80677420d806c96d8e792ae6652e

Note: gcc inside the container reports the correct info, kernel reports x86_64

root@386c88edfbc5:/# uname -m
x86_64
root@386c88edfbc5:/# gcc -dumpmachine | sed "s/-/-$(uname -p)-/"
i686-unknown-linux-gnu

and

root@dac44b74e4c0:/# uname -m
armv7l
root@dac44b74e4c0:/# gcc -dumpmachine | sed "s/-/-$(uname -p)-/"
arm-unknown-linux-gnueabi
kinnison commented 3 years ago

We use /proc/self/exe because that tends to tell us the userland host type rather than uname -m which would tell us the kernel architecture. Yes it's possible that's correct, but it's also possible for it to be wrong. E.g. some aarch64 systems can run 32-bit userlands, some armhf kernels can run armel userlands, etc.

It sounds like this is a limitation of docker buildx somehow not providing /proc which is unfortunate.

Any work to correct this would need to be fallback code in rustup-init.sh where if it cannot use /proc/self for some reason it looks at alternatives with suitable warnings.

If someone wants to work on this, please talk to us on the Rust discord in #wg-rustup because it will need some careful discussion.

miigotu commented 3 years ago

This is my temporary solution that works (building python cryptography):

# rust installer needs patched to get the correct binaries for ARMv6 and i686
RUN sed -i -e's/ main/ main contrib non-free/gm' /etc/apt/sources.list
RUN apt-get update -q && \
 apt-get install -yq build-essential curl git libssl-dev libffi-dev libxml2 libxml2-dev libxslt1.1 libxslt-dev libz-dev mediainfo python3-dev unrar nano && \
 pip install -U pip wheel && \
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs > rustup-init.sh && \
 sed -i 's#/proc/self/exe#$(which head)#g' rustup-init.sh && \
 sed -i 's#/proc/cpuinfo#/proc/cpuinfo 2> /dev/null || echo ''#g' rustup-init.sh && \
 sed -i 's#get_architecture || return 1#RETVAL=$(gcc -dumpmachine | sed "s/-/-unknown-/") #g' rustup-init.sh && \
 sh -x rustup-init.sh -y --default-host=$(gcc -dumpmachine | sed 's/-/-unknown-/') && \
 rm rustup-init.sh && \
 PATH=$PATH:$HOME/.cargo/bin pip install --no-cache-dir --no-input -Ur requirements.txt && \
 PATH=$PATH:$HOME/.cargo/bin rustup self uninstall -y && \
 apt-get purge -yq --autoremove build-essential libssl-dev libffi-dev libxml2-dev libxslt-dev libz-dev python3-dev && \
 apt-get clean -yq && rm -rf /var/lib/apt/lists/*
workingjubilee commented 3 years ago

I do not believe all BSDs support /proc, so I am labeling this as a BSD issue until it is confirmed otherwise. @rustbot label:+O-bsd

miigotu commented 3 years ago

I do not believe all BSDs support /proc, so I am labeling this as a BSD issue until it is confirmed otherwise. @rustbot label:+O-bsd

This is debian buster

workingjubilee commented 3 years ago

True, but the relevant high-order bit there seemed to be @rustbot label: +O-containers

vadixidav commented 2 years ago

I am getting this on macOS Monterey:

vscode ➜ ~ $ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
/usr/bin/head: error reading '/proc/self/exe': Bad file descriptor
/usr/bin/head: failed to close '/proc/self/exe': Bad file descriptor
rustup: unknown platform bitness
/bin/sh: 358: [: Illegal number: 
info: downloading installer

I am inside of a container built on: FROM --platform=linux/amd64 mcr.microsoft.com/vscode/devcontainers/cpp:0-debian-11. I am running linux/amd64 under Docker's built-in QEMU emulation. The container build process is fairly involved, so it definitely functions in general. I also can run other x86_64 applications in general in the container.

Cargo does install correctly though, so it doesn't hurt anything. Originally I thought this was a problem, but it still installs as expected. I figured this was still worth reporting here as another example of this occurring.

workingjubilee commented 2 years ago

Maybe I am misreading, but I can't quite tell: What CPU architecture is the host? AArch64? AMD64? PowerPC?

vadixidav commented 2 years ago

Maybe I am misreading, but I can't quite tell: What CPU architecture is the host? AArch64? AMD64? PowerPC?

aarch64-apple-darwin

kinnison commented 2 years ago

Perhaps we could switch from reading /proc/self/exe to reading $SHELL - would there be any situations we can think of where that wouldn't work?

miigotu commented 2 years ago

I am getting this on macOS Monterey:

vscode ➜ ~ $ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
/usr/bin/head: error reading '/proc/self/exe': Bad file descriptor
/usr/bin/head: failed to close '/proc/self/exe': Bad file descriptor
rustup: unknown platform bitness
/bin/sh: 358: [: Illegal number: 
info: downloading installer

I am inside of a container built on: FROM --platform=linux/amd64 mcr.microsoft.com/vscode/devcontainers/cpp:0-debian-11. I am running linux/amd64 under Docker's built-in QEMU emulation. The container build process is fairly involved, so it definitely functions in general. I also can run other x86_64 applications in general in the container.

Cargo does install correctly though, so it doesn't hurt anything. Originally I thought this was a problem, but it still installs as expected. I figured this was still worth reporting here as another example of this occurring.

This is how I have been understanding it, I could be wrong entirely.

If you are providing FROM --platform=$TARGETPLATFOM imagine:tag in the docker file, you aren't cross compiling inside the container, and the default target for cargo/rust is all you need. It's downloading the image from the manifest that matches the target arch, and running that image with qemu.

With docker buildx --platforms linux/amd64,linux/arm64 ... case, you are using qemu to boot and build the dockerfile AS that target platform. --platform=$*PLATFORM shouldn't need added at all since it has already booted the correct image with qemu. As far as I'm concerned, --platform should be the same as if it were injected and be exactly the same as if you had put --platform=$TARGETPLATFOM in the dockerfile.

When you need to add a target is when inside your docker file you have FROM --platform=$BUILDPLATFORM image name:tag (your host arch) and building targets inside that are not for your host arch with a cross compiler. (When it's a different image than what buildx thought it should boot)

The definition of cross is when building a binary for a different architecture than the OS currently running. But it's 2 systems entirely, a docker image and the host OS. With one --platform arg inside the docker container you are just compiling (not cross), and outside you are building a dockerfile with cross, not cross compiling. With the other --platform= arg you are building a dockerfile and inside you are cross compiling.

Inside vs outside cross is a confusing situation right now with buildx.

This has been some help understanding the confusion, but not enough: https://github.com/BretFisher/multi-platform-docker-build

I'm currently having this exact issue again, without --platform in my dockerfile, using a base image of python3.10-slim (debian bullseye base)

I'm going to try my previous hack some later, but maybe a bit different. Super annoying.

miigotu commented 2 years ago

So, I have made some progress on this. Buildkit runs in a security context that prevents the build from accessing /proc and other mounts.

I am testing right now by providing these changes to my workflow:

You have to pass this setting to buildkit and use a setting in the dockerfile.

Pass allow-insecure-entitlement security.insecure to buildkitd in one of three ways: docker buildx --buildkitd-flags '--allow-insecure-entitlement security.insecure' When creating the builder with: --allow-insecure-entitlement security.insecure Or in the buildkit.toml config file:

insecure-entitlements = [ "security.insecure" ]

Then in your dockerfile: RUN --security=insecure curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y

Test in buildkit shows exactly this: https://github.com/moby/buildkit/pull/1081/files#diff-d7f92add99ec729fffc073a432807fecbabd9fe2bb0dc35608b1eeef1fba69dbR29

Now, since we know we can't read /proc/self/exe out of the box when using BUILDKIT=1 docker ... the question is should it be documented, or should we build in a fallback or test to see if we are running under buildkit that then uses a different method? It will require a few changes to configuration for people who don't know what's happening.

miigotu commented 2 years ago

I found the solution! Demo and explanation incoming. Just need some documentation I think, there is nothing broken in rustup.

I spoke too soon. I successfully got security-insecure to work on github actions, and the script downloads the correct binaries for host/target, but the error still happens when trying to read /proc/self/exe. iirc from when I opened this issue it is a specially protected file in buildkit/docker to prevent exploiting any vulnerabilities and escaping the container.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sed 's#/proc/self/exe#$SHELL#g' | sh -s -- -y still gets around this problem. The arm6 vs arm7 problem is a bit more of a problem if someone needs support for armv6. But since armv6 is not supported by most official images anymore I guess that's not such a big deal.

I'll continue to look at it.

Here was my POC https://github.com/miigotu/actions-security-insecure-demo https://github.com/miigotu/actions-security-insecure-demo/actions

miigotu commented 2 years ago

POC seems to work with /proc/self/exe, but not rustup... Same command lol

kerberjg commented 1 year ago

@miigotu I actually managed to get it running! I tried your command from the previous comment, but it seems that the SHELL env var was undeclared, so I just replaced it with "/bin/sh" and voilà!

RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sed 's#/proc/self/exe#\/bin\/sh#g' | sh -s -- -y

For transparency: I was running the build through compose, with the following adjustments:

bmarwell commented 10 months ago

It's been a year. running the current rustup with FROM ubuntu:16.04 will result in:

error: could not read metadata for file: '/tmp/tmp.JlK1p9xHCy/rustup-init': Function not implemented (os error 38)
cschwan commented 6 months ago

@bmarwell do you still see the problem? I've also encountered it with the manylinux2014_x86_64 container (Containerfile). I followed it with strace:

[pid  9489] statx(24, "usr/local/cargo/bin", AT_STATX_DONT_SYNC|AT_SYMLINK_NOFOLLOW, STATX_TYPE|STATX_MODE, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=4096, ...}) =
 0
[pid  9784] statx(AT_FDCWD, "/usr/local/cargo/bin/rustup", AT_STATX_SYNC_AS_STAT, STATX_ALL,  <unfinished ...>
[pid  9489] statx(24, "usr/local/cargo", AT_STATX_DONT_SYNC|AT_SYMLINK_NOFOLLOW, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=4096, ...}) = 0
[pid  9784] <... statx resumed>0x7ffcfcd8b040) = -1 ENOENT (No such file or directory)
[pid  9784] statx(AT_FDCWD, "/tmp/tmp.3p2dR3GpcZ/rustup-init", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, 0x7ffcfcd8ad50) = -1 ENOSYS (Function not implemented)
error: could not read metadata for file: '/tmp/tmp.3p2dR3GpcZ/rustup-init': Function not implemented (os error 38)

This problem seems unrelated to the original issue, but I'm writing here because this is the only place I found a report of the same behaviour. Could this be a problem of a missing kernel configuration? I found a patch for a kernel bug: https://lore.kernel.org/lkml/20240414003434.2659-1-danny@orbstack.dev/, but the described scenario doesn't quite fit the description; in the output above you see that the failing statx syscall isn't the first statx call.

bmarwell commented 6 months ago

Uh I gave up on compiling on 16.04, sorry

cschwan commented 6 months ago

In my case it turned out that a podman system reset (deletes all containers, images, etc.) did the trick; possibly the container was bugged and a new version fixed the problem.