swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.59k stars 10.37k forks source link

[SR-13795] Make -static-stdlib binaries portable to Docker Scratch multistage images #56192

Open swift-ci opened 4 years ago

swift-ci commented 4 years ago
Previous ID SR-13795
Radar rdar://problem/70892595
Original Reporter marvin.hansen (JIRA User)
Type New Feature

Attachment: Download

Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 0 | |Component/s | | |Labels | New Feature | |Assignee | None | |Priority | Medium | md5: 0d0cb7f368f23b06e9c28a7ac144f91f

Issue Description:

Static linking of a simple hello world swift file seems to work when looking at LDD and the actual file size but the resulting statically linked binary still needs some Linux Lib's or runtime to run in a Docker container. At this point, it is not clear to me what exactly is missing or if a different compile flag would result in a portable binary. Also, I understand that static linking on Linux is relatively new in Swift.

Version:

Related issues:

Background:

For Golang and C/C++, one can do docker multistage builds which means compile and build in one stage, then copy the resulting statically linked binary into a Scratch container during the second stage, drop the first container, and then just execute the statically linked binary in the minimal container just containing the fat binary. Details here:

https://medium.com/@adriaandejonge/simplify-the-smallest-possible-docker-image-62c0e0d342ef

Setup:

When trying docker multistage build with Swift, the first stage works as expected, the statically linked binary gets created and then copied to the second, minimal container.

Compiler command:

swiftc -emit-executable -static-stdlib -O -o helloworld helloworld.swift

Expected behavior:

The resulting statically linked binary (aka fat binary) should run in any environment that can execute a fat binary i.e. Scratch, Alpine, Debian Buster Docker images etc.
For Golang, Scratch images are usually preferred as the actual container size is roughly equal to the binary size.

Actual behavior

The static binary executes in a Swift-Slim container and in a Debian Buster-Slim container. The latter was almost expected as the Swift container is based on Ubuntu, which is derived from Debian. However, the static binary does not execute in a Scratch of Alpine container.

Instead, a Scratch or Alpine container throws the following error

> standard_init_linux.go:211: exec user process caused "no such file or directory"

I'm not exactly sure what causes the issue, but I was simply unable to run the static binary in Scratch or Alpine images.

Reproducing

The attached Swift code and Dockerfile showcase the issue:

Steps to reproduce:

Just comment & uncomment the different images in the second build stage to trigger the error with Scratch & Alpine.

Background:

Small containers, especially scratch containers with fat binaries are strongly preferred in Kubernetes deployments. As Swift moves gradually into the server space, deploying to Kubernetes will certainly become more common and, while the Buster-Slim images are already reasonably small (\~100 Mb), it's still doesn't compare to common GoLang scratch images that usually hover around 5 to 15 Mb.

swift-ci commented 4 years ago

Comment by Marvin Hansen (JIRA)

Just as an update, when running the binary in an old docker container (i.e. debian jessie), the following error occurs:

> ./helloworld: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by ./helloworld)
> ./helloworld: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ./helloworld)

typesanitizer commented 4 years ago

@swift-ci create

spevans commented 3 years ago

The -static-stdlib flag doesn't actually produce statically linked executables - where all of the dependent libraries are including in the binary - but a dynamic executable with the Swift libraries supplied with the tool chain statically linked in.

Using the ldd command shows the libraries that are still dynamically linked and required on disk:

 docker run  -it --rm  test:latest ldd ./helloworld
    linux-vdso.so.1 (0x00007fffb09e0000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f52bf13e000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f52bef1f000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f52beb96000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f52be7f8000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f52be5e0000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f52be1ef000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f52c1361000)

I would guess that the scratch base image doesn't actually have some of these installed on the filesystem.

Note it is possible to create a fully static linked executable but support is very limited, eg Foundation wont work with it at present. Using the -static-executable flag eg:

RUN swiftc -emit-executable -static-executable -O -o helloworld helloworld.swift 

And lld shows:

$ docker run  -it --rm  test:latest ldd ./helloworld
    not a dynamic executable 

This executable did then run using the scratch base image:

$ docker run  -it --rm  test:latest ./helloworld
Hello, SWIFT BINARY in Docker! 

Note the large file size of these binaries is mostly due to the ICU library used for unicode normalisation and localisation being linked in as it is built and shipped with the swift toolchain.

swift-ci commented 3 years ago

Comment by Marvin Hansen (JIRA)

Thank you so much for the detailed elobaration @spevans

Yes, that's true and your suggested compiler flag produces the statically linked binary I was looking to accomplish. A bit of background, I am building microservices with gRPC on Kubernetes and so far, Swift is doing surprisingly well given that the relative early stage of the server side project. Reducing containers further to scratch images really is the last touch to complete this project.

Out of interest, when I have a larger project that uses the usual structure created by

swift package init --type executable

and that has dependencies defined in the usual Package.swift file, is there an equivalent build flag for static compile?

I tried:

swift build -Xswiftc -emit-executable -static-executable

The -Xswiftc flag is supposed to pass through swift compiler flags, but for whatever reason, that syntax wasn't right but I have not found the correct way yet. In theory, the docs say one can pass through -static-stdlib but, that isn't exactly working for all the reasons mentioned before.

swift build --configuration release -Xswiftc -static-stdli

UPDATE:

Swift build can't generate static executables, as outlined in SR-648.

https://bugs.swift.org/browse/SR-648

I keep this issue open for the time being. The main findings, by now:

1) Static compile works with the Swift compiler:

swiftc -emit-executable -static-executable -O -o helloworld helloworld.swift 

2) Static compiler with Swift build is limeted at best by passing some compiler flags:

swift build --configuration release -Xswiftc -static-executable

3) No source code containing the foundation lib can be statically compiled. That's one area of improvement that would make quite a meaningful difference.

spevans commented 3 years ago

The latest swift.org builds will compile -static-executable for Foundation but not FoundationNetworking because the latter uses libcurl and there arent .a libraries for all of curl dependencies provided by Ubuntu etc.

Also note that glibc will complain out being statically linked into a binary as its uses dynamic loading for some functionality notably PAM for looking up users. So with a recent nightly you can do:

$ cat hello.swift 
import Foundation

print(URL(string: "http://swift.org") as Any)

$ ~/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/bin/swiftc -O -static-executable -o test hello.swift 
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libFoundation.a(FileManager.swift.o):FileManager.swift.o:function $s10Foundation11FileManagerC17_attributesOfItem6atPath26includingPrivateAttributesSDyAA0B12AttributeKeyVypGSS_SbtKF: warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libFoundation.a(FileManager.swift.o):FileManager.swift.o:function $s10Foundation11FileManagerC17_attributesOfItem6atPath26includingPrivateAttributesSDyAA0B12AttributeKeyVypGSS_SbtKF: warning: Using 'getgrgid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libFoundation.a(NSPathUtilities.swift.o):NSPathUtilities.swift.o:function $s10Foundation22_NSCreateTemporaryFileys5Int32V_SStSSKF: warning: the use of `mktemp' is dangerous, better use `mkstemp' or `mkdtemp'
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libFoundation.a(Host.swift.o):Host.swift.o:function $s10Foundation4HostC8_resolveyyF: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libCoreFoundation.a(CFPlatform.c.o):CFPlatform.c:function CFCopyUserName: warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libCoreFoundation.a(CFPlatform.c.o):CFPlatform.c:function CFCopyUserName: warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libCoreFoundation.a(CFPlatform.c.o):CFPlatform.c:function CFCopyFullUserName: warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libCoreFoundation.a(CFPlatform.c.o):CFPlatform.c:function CFCopyFullUserName: warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libCoreFoundation.a(CFPlatform.c.o):CFPlatform.c:function _CFCopyHomeDirURLForUser: warning: Using 'getpwnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libCoreFoundation.a(CFPlatform.c.o):CFPlatform.c:function _CFCopyHomeDirURLForUser: warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libCoreFoundation.a(CFBundle_Binary.c.o):CFBundle_Binary.c:function _CFBundleDlfcnCheckLoaded: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libCoreFoundation.a(CFBundle_Binary.c.o):CFBundle_Binary.c:function _CFBundleDlfcnLoadBundle: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libCoreFoundation.a(CFBundle_Binary.c.o):CFBundle_Binary.c:function _CFBundleDlfcnLoadFramework: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/home/spse/swift-DEVELOPMENT-SNAPSHOT-2020-11-30-a-ubuntu20.04/usr/lib/swift_static/linux/libCoreFoundation.a(CFBundle_Main.c.o):CFBundle_Main.c:function CFBundleGetMainBundle: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

$ ./test 
Optional(http://swift.org)

ubuntu2004:~$ ldd test 
        not a dynamic executable
ubuntu2004:~$ ls -l test 
-rwxrwxr-x 1 si si 51242512 Dec  1 11:05 test

But this may have issues running on a scratch build if glibc needs to access certain files that it expects to be on disk.

Its probably a bit early to enable this functionality in swiftPM given that some parts dont work. I believe golang doesn't use a libc at all so glibc isnt a problem there, for swift it may well require getting it to work with musl or some other libc that doesnt have a problem with being statically linked in.

swift-ci commented 3 years ago

Comment by Marvin Hansen (JIRA)

Thank you for the update, @spevans

In my case, I already accepted that static linking won't be there anytime soon because I rely on swift-gRPC which depends on NIO and FoundationNetworking. That said, even with the regular dynamic linkage, my swift-slim docker containers are reasonable small (~200MB) and performance is overall quite good thanks to gRPC & protobuf.

Speaking of static linking in go, one less known detail is that the go-lang team actually re-implemented the C stdlib in go leaving only OS sys-calls as external dependencies. However, there is a catch-21 related to network (dns) and user lookup which still relies on some c-code by default most likely for historical reasons. When building static binaries using the Docker Scratch images, it is highly recommended to disable the C binding altogether during the build using the C_GO flag:

go build cgo_enabled=0

Back to catch-21, when you do that, go has to re-compile all dependencies that have not previously compiled with the C binding disabled so the build process may take quite a bit longer relative to the number of dependencies. That's virtually every external library because CGO is actually enabled by default. However, when switching the legacy c-bindings off, the Go compiler then uses alternatives calls written in go that mimic the network / DNS calls and user lookup. As a byproduct, this leads to substantially lower memory consumption over time, as discussed in the SO post(s) below. As a result, statically linked binaries are fairly trivial in Go with or without networking. I believe, Google designed it that way early on because if you operate popular web services 24/7, operational security is a very big deal and so hardened static binaries are pretty much everywhere at Google.

For Kubernetes, disabling legacy C bindings and, in particular, removing glibc as dependency altogether is a substantial security improvement and it actually improves somewhat OS portability as you can fallback to generic go std lib network calls in case there is no OS as in the case of scratch docker images. Looking at the long list of CVE's in glibc, this actually was a good decision if not one of the decision criteria back then.

Beyond security, there is a much more nuanced benefit to re-implementing the C-standard lib in the programming language itself instead of relying on an external lib such as glibc (or any other pre-existing implementation) and that it full and detailed control over hardware specific binaries. Google, like any other hyper-scalar already procures custom designed ARM CPU's in bulk for quite some time. When compiling Go code for Linux/ARM, Google can simply optimize the resulting binaries for exactly the ARM architecture they are using internally by optimizing the Go compiler and the Go standard lib. It's as close as you can get to hardware-software integration without actually designing the CPU itself. Considering the size of the compiler team at Google, I am pretty confident that is exactly what they do when not working on MLIR.

For standard mainstream ARM, CPU optimization is perhaps not the most interesting task, but with modern ARM architectures such as Neoverse N1 (up to 128 cores) or the recently released Apple M1 silicon (unified memory), all of a sudden, glibc initial design principles are increasingly at odds with contemporary hardware.

And then again, compiling the custom standard c-lib to windows, Linux, or even the Raspberry Pi becomes somewhat easier because there are fallback network calls that actually allow code portability without relying on the underlying OS. Obviously, this would be a substantial relief for those engineers working on porting swift to other platforms especially w.r.t maintenance.

Considering the many other areas Swift currently needs to improve (async, concurrency, etc), I don't see the prospect of a custom libc in Swift anytime soon if ever especially because doing so would require a fair amount of engineering work which I believe Apple as the main sponsor might not want to commit.

That said, I certainly understand why system engineers prefer go-lang over everything else. If Swift aims serious server side engineering, it faces a steep uphill battle in operations predominantly because of DevOps, tooling, and security.

https://stackoverflow.com/questions/54456100/why-does-cgo-enable-make-a-such-impact-on-virtual-memory

https://stackoverflow.com/questions/47714278/why-is-compiling-with-cgo-enabled-0-slower

https://lukeeckley.com/post/useful-go-build-flags/

https://www.cvedetails.com/vulnerability-list/vendor_id-72/product_id-767/GNU-Glibc.html

https://github.com/golang/go/wiki/GoArm