Generate multi-architecture docker containers

mithro commented 3 years ago

It would be good to generate docker containers which support multiple architectures. As things seem to already be using buildx / buildkit style builds, it might be as simple as following the instructions under "With QEMU".

mithro commented 3 years ago

@carlosdp

carlosedp commented 3 years ago

With the current hardware availability, I think at first we could have ARM64 and PPC64le images in addition to AMD64. All with manifests so they are seamlessly pulled.

In the future we can add RISC-V too but this depends on having the upstream distro images available (for example debian already has riscv64 image but only on sid tag).

mithro commented 3 years ago

umarcor commented 3 years ago

Short answer: yes, it is in the scope but we don't have human resources to implement it. As explained below, I have a good background about what needs to be done; however, implementing and testing everything would take more time than I can devote. Other people need to take care of subsets of the collections if we are to keep increasing the list of tools, collections and architectures. I'm off the limit of a single-man project, which is not my income.

Longer answer:

It would be good to generate docker containers which support multiple architectures.

This is in fact not achieved by having an image/container which supports multiple architectures. It is done by creating one different image for each target architecture; then, creating a "manifest" which behaves as an index of "a group of images which are supposed to be the same". AFAIAA nothing prevents from having "distinct" images presented to the user under the same manifest.

That is relevant because the idea of having a single dockerfile which can be built on multiple platforms is a utopy. As it happens with any "multi-platform packaging" approach, it does work most of the time, but there are some corner cases that do require specific steps/stages/commands on some platforms only. This is just a trivial example: https://github.com/dbhi/containers/blob/main/ubuntu-bionic/cosim.dockerfile#L76-L97.

The examples I've seen which use buildx/buildkit are of the "simple" type, where a single Golang/Python/JavaScript service/tool can be containerised easily. I did not yet see how are the challenging cases expected to be managed. As it happens with containers in general, attention and hypes are very focused on web (micro)services. Therefore, we need to pick announcements and features with a grain of salt when targeting EDA and on-desktop usage. I am anyway willing to learn about it, because manually keeping track of the sets of images that compose each manifest is rather painful and error prone.

Since images are independent in fact, there is no need to build all of them on the same host. Should there be users/developers/companies with PPC, S390X or RISCV hosts willing to contribute in this project/repo, we might setup those as self-hosted runners, so that we can build those images natively. Alternatively, they might fork this repo, maintain a variant for those other platforms and build/push images on their own. We can mirror them and create the manifests here regardless.

it might be as simple as following the instructions under "With QEMU".

As a disclaimer, I am the maintainer of dbhi/qus, which is unwantedly the most starred repository of all the ones I created/authored. I wish those capabilities were provided by some third-party so I could reduce my cognitive load. I did try communicating with docker maintainers/developers, with little success. See (https://dbhi.github.io/qus/#linuxkitbinfmt-dockerbinfmt):

Although not well documented, since version 1.13.0 (2017-01-19), QEMU is installed by default with Docker Desktop and a tool written in golang (named binfmt) is used to register interpreters in the kernel. The upstream project is linuxkit/linuxkit.

For further details about similarities/differences, see:

The project from docker (docker/binfmt) was deprecated in favour of github.com/linuxkit/linuxkit/tree/master/pkg/binfmt.

docker/binfmt#17

linuxkit/linuxkit#3401

moby/qemu

At DockerCon San Francisco 2019, a partnership with Arm was announced: Building Multi-Arch Images for Arm and x86 with Docker Desktop.

Note the feature differences explained in docker/binfmt#17:

In dbhi/qus:

Images for multiple host architectures are provided.

Additional target architectures are provided.

It is possible to optionally limit/select the list of QEMU binaries to be registered on the host.

It is possible to remove/reset all the registered interpreters.

In docker/binfmt:

The logic is embedded in a static binary written in golang.

NOTE: dbhi/qus is based on an enhanced qemu-binfmt-conf.sh and a companion register.sh script. As a result, a shell is required and images are based on busybox instead of scratch.

Docker's interest in QEMU seems to be focused on allowing Windows and Mac OS users to execute ARM containers transparently. It's coming from a partnership between the companies involved and I believe it is strongly driven by smartphones and IoT. I feel there is little to no interest in IBM's platforms or RISCV. Maybe IBM and/or the OpenPower/RISCV lobbies should step in (if they did not already).

That is very relevant because qemu-user is the little/ugly brother/sister of qemu-system and KVM. The most profitable usage of QEMU in big companies is virtual machine creation without involving foreign platforms. Probably the most obvious example is katacontainers.io, with architecture committee members from Apple, Intel, Red Hat, etc. (see https://github.com/kata-containers/community#architecture-committee). As a result, despite qemu-user (the one based on Dynamic Binary Modification) being the initial implementation of QEMU, it does not receive as much love as the KVM variant. In full honestly, it was improved a lot during these last years. When I first used it "seriously" in 2018 there were some decade old bugs on ARM targets, and that is no longer the case.

On the other hand, I also spent long time trying to have some minor enhancements (https://github.com/umarcor/qemu/commits/series-qemu-binfmt-conf) upstreamed to QEMU, with no success.

I believe the original research/tests for using Docker and QEMU came from Balena and multiarch/qemu-user-static. That repo does have a wider scope than Docker/linuxkit, as it supports many foreign platforms as targets, but it's for x64 hosts only. As said above, I wish linuxkit's/docker's binfmt, the multiarch repo, or any other project could replace dbhi/qus. As you might guess, I built some frustration around this topic during the last years. I'm waiting for the world to catch up, since that's not a fight I can take longer.

Don't take me wrong: I am an strong advocate of using Dynamic Binary Modification (DBM) (say qemu-user) as an alternative to cross-compilation. So, in my research work, I use arm32v7 containers on my x64 workstation for testing and building the go/python tools which I then deploy to Zynq devices. By the same token, I use arm64v8 containers to work with the apps that are to be tested on SBCs or ARM servers. That's why dbhi/qus and dbhi/containers exist.

DBM is so useful and an exciting area of computer science, as proven by Apple's Rosetta. That is in fact one of the interests of the Advanced Processor Technologies research group at the University of Manchester. They have a very strong background on DBM and FPGAs, which is the motivation for one of the outcomes of my PhD to be named Dynamic Binary Hardware Injection (DBHI). See the publication references in https://github.com/umarcor/umarcor/blob/main/references/apt.bib.

Incidentally:

I met @mithro at FPGA 2020, where I attended to present https://dbhi.github.io/pdf/FPGA2020_poster.pdf.
There is a reference to EFCAD at the bottom of my poster, which is part of FPGA-Research-Manchester. The group of Dirk Koch is the room next to the group of Mikel Luján (my advisor there).
DBHI does have a conceptual and functional overlap with the CFU Playground which Tim Callahan (@tcal-x) is working on.

To put it simply, we (shall I say they at Manchester) support custom opcodes on an existing CPU by hooking them in the DBM "engine". Most of the practical implementations are done with MAMBO (/ccing @GuillermoCallaghan here, who has good knowledge about ARM ISAs and interest in RISCV). They implement "accelerators" on FPGA and use MAMBO for hooking opcodes, either existing or custom. I quoted accelerators because the kind of HDL accelerators they use are mostly for architectural simulation (performance counters).

Nevertheless, during my stay at the University of Manchester in 2018, I did evaluate DynamoRIO and Pin as well. One of the outcomes of DBHI is that a MAMBO plugin or a DynamoRIO tool (different terms, same concept) would allow to use CFUs on CPUs without native support. We tested half a dozen devices (workstation, server, laptop, SBC...). My main research interest is abstracting the layer(s) between software and hardware during the development, implementation and verification of custom accelerators. I use VHDL, but at Manchester they use Verilog and BlueSpec SV as well, so there is synergy with the Verilator and Renode stuff that Antmicro is integrating into the CFU playground. I hope I can meet @tcal-x face-to-face in the not distant future and we can have some drink while sharing ideas.

Back from the detour, it does not really matter how we install/setup QEMU. We are creating Linux containers only. GitHub Actions supports containers on Ubuntu jobs only. Therefore, apt install qemu-user-static works. The interpreters are statically built, so it is irrelevant where you get them from. The difference between linuxkit/docker binfmt, multiarch/qemu-user-static, dbhi/qus, Actions, etc. is just how to load one interpreter (or many) in "permanent" mode (in-memory). The challenge is using qemu-user itself:

Performance is 5 to 10 times slower. MAMBO and DynamoRIO do have a much smaller penalisation (5-20%), because they are DBM tools which do not translate (they handle branching/jumps). QEMU user needs to translate instructions. There is litttle we can do about it except support the development of QEMU. We would need to setup some machine in the Google Cloud for this project, to avoid timeouts in GitHub Actions. We might almost guarantee that such machine would be active 100% of the time.
qemu-user is not isolating/emulating the system (unlike qemu-system). Therefore, some signals and lower level features do crash.

As a result, we can try it for curiosity and to find the limits in the technology/tools, but I don't think that's something we can rely on for providing the whole collections. As commented above, we want people with powerful and/or spare ARM, PPC, S390X or RISCV hardware to contribute.

Using copr might be an option for having ARM, PPC and maybe RISCV machines. However, I wouldn't feel comfortable "abusing" Fedora's infrastructure as much as I am with GitHub (say Microsoft) because scales are very different and we know that Google is willing to provide processing power, should it be needed (i.e. should GitHub/Microsoft complain). If we were to go the copr way, I believe there should be some better communication with Fedora as a project or, at least, with copr administrators.

Nonetheless, I will try prototyping the structure and CI plumbing for handling multiple architectures and manifests here by using QEMU. As commented in twitter, it should be relatively easy for the non-challenging cases. That will let us see the performance and complexity.

umarcor commented 3 years ago

I added support for multiple architectures. The new image name format is REGISTRY/[ARCHITECTURE/][COLLECTION/]IMAGE. Both ARCHITECTURE and COLLECTION are optional. Note that we are not publishing manifests yet. Therefore, users on non-amd64 hosts need to use the image name including the architecture explicitly.

Tools openFPGALoader, gtkwave, verilator, magic, netgen, icestorm and arachne-pnr in collection Debian Bullseye are available for the following architectures already: arm64v8, ppc64le and s390x. See execution times:

It would be helpful if users of those architectures could test these new images and confirm that they work.

https://hub.docker.com/r/riscv64/debian exists, but buster-slim and bullseye-slim are not available; only sid-slim is available for now. Hence, users of riscv64 need to use QEMU at runtime, or wait until bullseye is supported.

With regard to CentOS, neither https://hub.docker.com/r/s390x/centos nor https://hub.docker.com/r/riscv64/centos do exist. I tried enabling arm64v8 and ppc64le, but it fails: https://github.com/hdl/containers/actions/runs/1148193717

Ref: https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way

umarcor commented 3 years ago

@carlosedp do you know whether/when might riscv64/centos and riscv64/debian:bullseye-slim be supported? Similarly, do you expect riscv32 to be available in https://github.com/docker-library/official-images#architectures-other-than-amd64 in the "near" future?

hdl / containers

Generate multi-architecture docker containers #40