Closed astefanutti closed 7 months ago
ARM chip support is an important item for the KubeRay community in the rest of Q4. @tedhtchang is willing to take this issue.
Thanks. I will take a look see if there is other requirements.
May someone follow the command and try the multiarch image on the arm64 chip device and see if it works?
kind create cluster
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --set image.repository=quay.io/tedchang/ray-operator --set image.tag=v1.1.1.rc.1
kubectl logs deploy/kuberay-operator
@tedhtchang I've tested it on a Jetson Orin (Arm A78 64-bit CPU) and it works.
However, it seems there are two extra container images added to the multi-architecture manifest with unknown architecture, that should not be there:
$ docker manifest inspect quay.io/tedchang/multiarch-ray-operator
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.index.v1+json",
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"size": 675,
"digest": "sha256:14b2d97f464abe7fd0767c42084e1ce98d916d9356668454a19f19d02f70e89a",
"platform": {
"architecture": "arm64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"size": 675,
"digest": "sha256:849742204d70a4c9851c0b5d43698be9e7db75959db0e1e243e8587311f7b09a",
"platform": {
"architecture": "amd64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"size": 566,
"digest": "sha256:f7a79778a8491e4ced8c35d8fbbd6fee7e5b029de4dfc940b3a47e9a249b586c",
"platform": {
"architecture": "unknown",
"os": "unknown"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"size": 566,
"digest": "sha256:af9a52838eeed4a46628e495f43f0514d198ff90911c9a37d2dcc50a40fa929e",
"platform": {
"architecture": "unknown",
"os": "unknown"
}
}
]
}
Or:
$ docker buildx imagetools inspect quay.io/tedchang/multiarch-ray-operator
Name: quay.io/tedchang/multiarch-ray-operator:latest
MediaType: application/vnd.oci.image.index.v1+json
Digest: sha256:e5c9c5bedb3dc844327f7a36aab3c960abecb27023a0de5110bf7982da322453
Manifests:
Name: quay.io/tedchang/multiarch-ray-operator:latest@sha256:14b2d97f464abe7fd0767c42084e1ce98d916d9356668454a19f19d02f70e89a
MediaType: application/vnd.oci.image.manifest.v1+json
Platform: linux/arm64
Name: quay.io/tedchang/multiarch-ray-operator:latest@sha256:849742204d70a4c9851c0b5d43698be9e7db75959db0e1e243e8587311f7b09a
MediaType: application/vnd.oci.image.manifest.v1+json
Platform: linux/amd64
Name: quay.io/tedchang/multiarch-ray-operator:latest@sha256:f7a79778a8491e4ced8c35d8fbbd6fee7e5b029de4dfc940b3a47e9a249b586c
MediaType: application/vnd.oci.image.manifest.v1+json
Platform: unknown/unknown
Annotations:
vnd.docker.reference.digest: sha256:14b2d97f464abe7fd0767c42084e1ce98d916d9356668454a19f19d02f70e89a
vnd.docker.reference.type: attestation-manifest
Name: quay.io/tedchang/multiarch-ray-operator:latest@sha256:af9a52838eeed4a46628e495f43f0514d198ff90911c9a37d2dcc50a40fa929e
MediaType: application/vnd.oci.image.manifest.v1+json
Platform: unknown/unknown
Annotations:
vnd.docker.reference.digest: sha256:849742204d70a4c9851c0b5d43698be9e7db75959db0e1e243e8587311f7b09a
vnd.docker.reference.type: attestation-manifest
Quay.io web interface gets also confused.
adding the provenance=false
to the buildx command fixed the problem. docker buildx build --push --tag quay.io/tedchang/multiarch-ray-operator:latest --platform linux/arm64,linux/amd64 --provenance=false .
Do we plan to have these github workflow to build multiarch images?
Do we plan to have these github workflow to build multiarch images?
We will build the multi-arch images in the KubeRay CI, and it's not necessary to include everything in one PR. You can start by creating a PR for the Dockerfile alone, and then open subsequent PRs to update the CI pipeline. If you don’t have time to work on the CI pipeline, I can take over that part.
Do we plan to have these github workflow to build multiarch images?
We will build the multi-arch images in the KubeRay CI, and it's not necessary to include everything in one PR. You can start by creating a PR for the Dockerfile alone, and then open subsequent PRs to update the CI pipeline. If you don’t have time to work on the CI pipeline, I can take over that part.
I don't think changed the Dockerfile yet. The base images like the go-toolset is already multiarch. Could you point me to the CI pipeline that build and push the operator to docker.io and quay.io ?
Hey guys I experimented building the multi-arch images using the docker/build-push-action@v5
action in my own github repo and registries. The action does exact same thing as docker buildx build --push --tag ..
command which builds docker images in a Qemu emulators. An example of the job output.
The Build MultiArch images step alone took 12mins+, a known problem of building container images in an emulator. Therefore it's too heavy to run with the Go-build-and-test workflow for each PR
Alternatively, I am trying to build operator binaries directly in the Ubuntu runner vm, for example. This is fast but CGO_ENABLED=1 in CGO_ENABLED=1 GOOS=linux GOARCH=arm64 go build -tags strictfipsruntime -a -o manager-${GOARCH} main.go
gives error.
gcc_arm6[4](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:4).S: Assembler messages:
gcc_arm64.S:30: Error: no such instruction: `stp x29,x30,[sp,'
gcc_arm64.S:34: Error: too many memory references for `mov'
gcc_arm64.S:36: Error: no such instruction: `stp x19,x20,[sp,'
gcc_arm64.S:39: Error: no such instruction: `stp x21,x22,[sp,'
gcc_arm64.S:42: Error: no such instruction: `stp x23,x24,[sp,'
gcc_arm64.S:4[5](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:5): Error: no such instruction: `stp x25,x2[6](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:6),[sp,'
gcc_arm64.S:48: Error: no such instruction: `stp x2[7](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:7),x2[8](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:8),[sp,'
gcc_arm64.S:52: Error: too many memory references for `mov'
gcc_arm64.S:53: Error: too many memory references for `mov'
gcc_arm64.S:54: Error: too many memory references for `mov'
gcc_arm64.S:56: Error: no such instruction: `blr x20'
gcc_arm64.S:57: Error: no such instruction: `blr x1[9](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:10)'
gcc_arm64.S:59: Error: no such instruction: `ldp x27,x28,[sp,'
gcc_arm64.S:62: Error: no such instruction: `ldp x25,x26,[sp,'
gcc_arm64.S:65: Error: no such instruction: `ldp x23,x24,[sp,'
gcc_arm64.S:68: Error: no such instruction: `ldp x21,x22,[sp,'
gcc_arm64.S:71: Error: no such instruction: `ldp x[19](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:20),x[20](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:21),[sp,'
gcc_arm64.S:74: Error: no such instruction: `ldp x29,x30,[sp],'
Error: Process completed with exit code 1.
I will look into building on multiple runners https://docs.docker.com/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners
@tedhtchang thanks, that's great progress!
The Build MultiArch images step alone took 12mins+, a known problem of building container images in an emulator. Therefore it's too heavy to run with the Go-build-and-test workflow for each PR
Right, the performance with QEMU are terrible and I agree this should be avoided for PR checks.
Alternatively, I am trying to build operator binaries directly in the Ubuntu runner vm, for example. This is fast but CGO_ENABLED=1 in
CGO_ENABLED=1 GOOS=linux GOARCH=arm64 go build -tags strictfipsruntime -a -o manager-${GOARCH} main.go
gives error.gcc_arm6[4](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:4).S: Assembler messages: gcc_arm64.S:30: Error: no such instruction: `stp x29,x30,[sp,' gcc_arm64.S:34: Error: too many memory references for `mov' gcc_arm64.S:36: Error: no such instruction: `stp x19,x20,[sp,' gcc_arm64.S:39: Error: no such instruction: `stp x21,x22,[sp,' gcc_arm64.S:42: Error: no such instruction: `stp x23,x24,[sp,' gcc_arm64.S:4[5](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:5): Error: no such instruction: `stp x25,x2[6](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:6),[sp,' gcc_arm64.S:48: Error: no such instruction: `stp x2[7](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:7),x2[8](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:8),[sp,' gcc_arm64.S:52: Error: too many memory references for `mov' gcc_arm64.S:53: Error: too many memory references for `mov' gcc_arm64.S:54: Error: too many memory references for `mov' gcc_arm64.S:56: Error: no such instruction: `blr x20' gcc_arm64.S:57: Error: no such instruction: `blr x1[9](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:10)' gcc_arm64.S:59: Error: no such instruction: `ldp x27,x28,[sp,' gcc_arm64.S:62: Error: no such instruction: `ldp x25,x26,[sp,' gcc_arm64.S:65: Error: no such instruction: `ldp x23,x24,[sp,' gcc_arm64.S:68: Error: no such instruction: `ldp x21,x22,[sp,' gcc_arm64.S:71: Error: no such instruction: `ldp x[19](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:20),x[20](https://github.com/tedhtchang/kuberay/actions/runs/7029225553/job/19126729128#step:14:21),[sp,' gcc_arm64.S:74: Error: no such instruction: `ldp x29,x30,[sp],' Error: Process completed with exit code 1.
This likely comes from the C toolchain that's used by default for CGO, that still is the host one, while it should be the target one.
I will look into building on multiple runners https://docs.docker.com/build/ci/github-actions/multi-platform/#distribute-build-across-multiple-runners
Another option could be to use a cross compiler, from the host to the target architecture, e.g., for arm64:
$ apt-get install gcc-aarch64-linux-gnu libc6-dev-arm64-cross
$ CC=aarch64-linux-gnu-gcc CGO_ENABLED=1 GOOS=linux GOARCH=arm64 go build -tags strictfipsruntime -a -o manager-${GOARCH} main.go
As KubeRay does not use C code / dependencies directly, I would expect it to be enough.
Thanks, @tedhtchang and @astefanutti! Just a follow-up. Is there any progress on this issue Thanks!
I have tried different approaches to optimize image build time since this runs with every PR. Cross compiling the go binaries from the Ubuntu runner and then COPY them into the docker image was the quickest approach. I will create a PR today for review.
@kevin85421 @tedhtchang I've created ray-project/ray#41727 for Ray to have multi-architecture support end-to-end.
Search before asking
Description
The container images pushed in DockerHub and Quay are only for the
linux/amd64
architecture.While it is possible to build container images for other architectures, these are not published, nor are the multi-architecture manifests.
Use case
No response
Related issues
No response
Are you willing to submit a PR?