opendatahub-io / kserve

Standardized Serverless ML Inference Platform on Kubernetes
https://kserve.github.io/website/
Apache License 2.0
0 stars 20 forks source link

Multi-arch images are not being built and pushed #389

Closed npanpaliya closed 1 month ago

npanpaliya commented 1 month ago

Bug: Multi-arch images for various kserve components are not being built and pushed to quay.io. There are images till v0.12.1.1 version but not after that.

What steps did you take and what happened: On quay.io, no images for ppc64le or s390x are seen for newer tags of kserve.

What did you expect to happen: On quay.io, all the tags of kserve-controller, kserve-qpext, kserve-router, kserve-agent should have multi-arch images.

What's the InferenceService yaml: [To help us debug please run kubectl get isvc $name -n $namespace -oyaml and paste the output]

Anything else you would like to add: I checked and found that the build and push jobs are failing after this https://github.com/opendatahub-io/kserve/pull/365/files is merged.

mkumatag commented 1 month ago

I checked and found that the build and push jobs are failing after this https://github.com/opendatahub-io/kserve/pull/365/files is merged.

can you please point out what jobs are we talking about here and there location and failing with what error?

npanpaliya commented 1 month ago

Almost all the jobs that build and push docker images are failing. Sample job that is failing is https://github.com/opendatahub-io/kserve/actions/runs/9308295238/job/25621451106 . The error seen is as below -

 > [linux/arm/v7 internal] load metadata for registry.access.redhat.com/ubi8/go-toolset:1.21:
------
agent.Dockerfile:2
--------------------
   1 |     # Build the inference-agent binary
   2 | >>> FROM registry.access.redhat.com/ubi8/go-toolset:1.21 as builder
   3 |     
   4 |     # Copy in the go src
--------------------
ERROR: failed to solve: registry.access.redhat.com/ubi8/go-toolset:1.21: failed to resolve source metadata for registry.access.redhat.com/ubi8/go-toolset:1.21: no match for platform in manifest: not found
Error: buildx failed with: ERROR: failed to solve: registry.access.redhat.com/ubi8/go-toolset:1.21: failed to resolve source metadata for registry.access.redhat.com/ubi8/go-toolset:1.21: no match for platform in manifest: not found
mkumatag commented 1 month ago

I guess this is because we don't have all the architectures in the fat manifest what been used in the buildx command

/usr/bin/docker buildx build --file agent.Dockerfile --iidfile /tmp/docker-build-push-c0SyC5/iidfile --platform linux/amd64,linux/arm/v7,linux/arm64/v8,linux/ppc64le,linux/s390x --tag quay.io/opendatahub/kserve-agent:v0.12.1-latest --metadata-file /tmp/docker-build-push-c0SyC5/metadata-file --push .

and the image manifest:

% manifest-tool inspect registry.access.redhat.com/ubi8/go-toolset:1.21
Name:   registry.access.redhat.com/ubi8/go-toolset:1.21 (Type: application/vnd.docker.distribution.manifest.list.v2+json)
Digest: sha256:00a64493787c0839c221d320dbed90593cba5da93d1b8d16ca132de33b7cc1c6
 * Contains 4 manifest references:
[1]     Type: application/vnd.docker.distribution.manifest.v2+json
[1]   Digest: sha256:14f0efb6fd0638b4bec1d4826238d88c4714fb5807c4fe03c47a7ceff4963a49
[1]   Length: 927
[1] Platform:
[1]    -      OS: linux
[1]    -    Arch: amd64
[1] # Layers: 4
     layer 01: digest = sha256:8694db102e5bd27fa30106f87d5a0f0c5ccccac0e5cc38ba56080d7559377096
     layer 02: digest = sha256:7027f4e4058bde8aaa497e47562e962c293039ba16f5fbfd07ff43a0d1dbd5a2
     layer 03: digest = sha256:be575238ea985ef824635fbeaf7b33eaed98ef6ba1db1822ac0714f509304d17
     layer 04: digest = sha256:b6a366ac05e5458bcf8b40dd8802b77ffbbe792e074a976b85ea818a92607e05

[2]     Type: application/vnd.docker.distribution.manifest.v2+json
[2]   Digest: sha256:3649b24cfee29a640c79579b0de878c193ef728f0765ba32ba0ca6f9c336e581
[2]   Length: 927
[2] Platform:
[2]    -      OS: linux
[2]    -    Arch: arm64
[2] # Layers: 4
     layer 01: digest = sha256:376a7503a7e3f4f0174eab5f3cc8b99d957df631e9b39007ff8d3c38b15fb498
     layer 02: digest = sha256:d8955d3e8e04900a695f484cedfd2fb469e4965201c5b6e7dc70afbe15a612e8
     layer 03: digest = sha256:39b8358648e5bedce52187a85e7b2c939b771a6efec95f523050e8188f64af3e
     layer 04: digest = sha256:5634b3f003a3a0e1ff790830da7709ae12c3a0fd153e9f4a98297b73440a4985

[3]     Type: application/vnd.docker.distribution.manifest.v2+json
[3]   Digest: sha256:8d8cea334bdfbb9666c500da245c498329606002d448bccb65bef0b37306325a
[3]   Length: 927
[3] Platform:
[3]    -      OS: linux
[3]    -    Arch: ppc64le
[3] # Layers: 4
     layer 01: digest = sha256:48ee1a30d8dff876ecc8e1d58796d23d8b05781be18219c563a9ebae70b451d2
     layer 02: digest = sha256:4ffb1d6e5ff4532e29ccdc30ba42aaca9030bef924d14253b80968d9b884dbf9
     layer 03: digest = sha256:86bf16578e3b8f082704519dc77f29d1f9b54ecf8aa66f3665b7c7923e3f8500
     layer 04: digest = sha256:f47f1a5e243ae10356fd0c173cc630e14c72911647bc0867024e5e7ac33c43d3

[4]     Type: application/vnd.docker.distribution.manifest.v2+json
[4]   Digest: sha256:b4ab339708e91f0958629644c5266374ab9bcc2b9672b6a6b36e4a5b3d8fc6d2
[4]   Length: 927
[4] Platform:
[4]    -      OS: linux
[4]    -    Arch: s390x
[4] # Layers: 4
     layer 01: digest = sha256:b7932689f67308f14942eb003a8468c377e71e2ca5d4f7e2a61fccb7e5a5be0e
     layer 02: digest = sha256:46b6a613d0289207035e9cfc32097a66b073a23789702a771ab248f7d7afdc6c
     layer 03: digest = sha256:a1711110fe60d7489f2d1ba3e99c774e0809c4eaf8ef5c6833f1662c67599a2e
     layer 04: digest = sha256:5ff9dcde348e947755138f1a62577b614c70f30f6827c0857362e5182fd5caa4

can we reduce the list to linux/amd64,linux/ppc64le,linux/s390x and test the flow.?

npanpaliya commented 1 month ago

Yes, I've already tried removing linux/arm/v7 from the list of platforms and build part of the job has passed for it. https://github.com/npanpaliya/kserve/actions/runs/10058984626/job/27803311430.

mkumatag commented 1 month ago

Yes, I've already tried removing linux/arm/v7 from the list of platforms and build part of the job has passed for it. https://github.com/npanpaliya/kserve/actions/runs/10058984626/job/27803311430.

perfect but it is failing with some other error I believe..

mkumatag commented 1 month ago

Yes, I've already tried removing linux/arm/v7 from the list of platforms and build part of the job has passed for it. https://github.com/npanpaliya/kserve/actions/runs/10058984626/job/27803311430.

perfect but it is failing with some other error I believe..

ignore, error is while pushing the image.. expected..