k3d-io / k3d

Little helper to run CNCF's k3s in Docker
https://k3d.io/
MIT License
5.46k stars 462 forks source link

Image import doesn't work on macOS with Apple M1 #954

Closed ThomasVitale closed 2 years ago

ThomasVitale commented 2 years ago

What did you do

What did you expect to happen

I expected the image to be loaded correctly, but it didn't. Initially, I considered whether it was a problem with the containerd CLI when loading arm64 images (similar to https://github.com/kubernetes-sigs/kind/issues/2549), but it fails consistently also with images built specifically for amd64.

Screenshots or terminal output

The same error is thrown running any of the previous commands.

INFO[0000] Importing image(s) into cluster 'mycluster'  
INFO[0000] Auto-detected a remote docker daemon, using tools node for loading images 
INFO[0000] Starting new tools node...                   
INFO[0000] Starting Node 'k3d-mycluster-tools'          
INFO[0000] Saving 1 image(s) from runtime...            
INFO[0012] Importing images into nodes...               
INFO[0012] Importing images from tarball '/k3d/images/k3d-mycluster-images-20220201175708.tar' into node 'k3d-mycluster-server-0'... 
ERRO[0014] failed to import images in node 'k3d-mycluster-server-0': Exec process in node 'k3d-mycluster-server-0' failed with exit code '1' 
INFO[0014] Removing the tarball(s) from image volume... 
INFO[0015] Removing k3d-tools node...                   
INFO[0015] Successfully imported image(s)               
INFO[0015] Successfully imported 1 image(s) into 1 cluster(s) 

The final message says the image has been imported correctly, even if an error is thrown. Therefore, I tried running the image as a Pod and it fails as follows.

Error: failed to create containerd container: error unpacking image: failed to resolve rootfs: content digest sha256:86512c94ca5f80c9406083a8d44baa1fa5578fbdb5e0a80e3d07454de1065486: not found

Which OS & Architecture

Which version of k3d

$ k3d version
k3d version v5.2.2
k3s version v1.21.7-k3s1 (default)

Which version of docker

$ docker version
Client:
 Cloud integration: v1.0.22
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:46:56 2021
 OS/Arch:           darwin/arm64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:43:07 2021
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
iwilltry42 commented 2 years ago

Hi @ThomasVitale , thanks for opening this issue! Sorry I got back to it that late. Can you please paste the logs here from the same command, but with --trace appended so we can see the full trace log?

iwilltry42 commented 2 years ago

@ThomasVitale , any update on this? FWIW, I switched the default mode back to the original --mode tools-node, since the new direct mode caused some issues, which may be related to yours.

iwilltry42 commented 2 years ago

Feel free to reopen if it's still present in the upcoming v5.4.0 release :+1:

ethanttbui commented 2 years ago

@iwilltry42 I'm facing the same problem as mentioned above on Mac OS Apple M1 and k3d v5.4.1

CodingCanuck commented 2 years ago

@iwilltry42 I'm seeing this issue on ubuntu when upgrading from k3d 5.3.0 to 5.4.1.

I'm seeing output like:

INFO[0000] Importing image(s) into cluster '$CLUSTER_NAME' 
INFO[0000] Starting new tools node...                   
INFO[0000] Starting Node 'k3d-$CLUSTER_NAME-tools' 
INFO[0000] Saving 3 tarball(s) to shared image volume... 
INFO[0000] Importing images into nodes...               
INFO[0000] Importing images from tarball '/k3d/images/k3d-$CLUSTER_NAME-images-$IMAGE1.tar' into node 'k3d-$CLUSTER_NAME-server-0'... 
INFO[0000] Importing images from tarball '/k3d/images/k3d-$CLUSTER_NAME-images-$IMAGE2.tar' into node 'k3d-$CLUSTER_NAME-server-0'... 
INFO[0000] Importing images from tarball '/k3d/images/k3d-$CLUSTER_NAME-images-$IMAGE3.tar' into node 'k3d-$CLUSTER_NAME-server-0'... 
ERRO[0001] failed to import images in node 'k3d-$CLUSTER_NAME-server-0': Exec process in node 'k3d-$CLUSTER_NAME-server-0' failed with exit code '1' 
ERRO[0001] failed to import images in node 'k3d-$CLUSTER_NAME-server-0': Exec process in node 'k3d-$CLUSTER_NAME-server-0' failed with exit code '1' 
ERRO[0001] failed to import images in node 'k3d-$CLUSTER_NAME-server-0': Exec process in node 'k3d-$CLUSTER_NAME-server-0' failed with exit code '1' 
INFO[0001] Removing the tarball(s) from image volume... 
INFO[0002] Removing k3d-tools node...                   
INFO[0003] Successfully imported image(s)               
INFO[0003] Successfully imported 3 image(s) into 1 cluster(s) 

One thing that jumps out: the code in importWithToolsNode() that logs these failures when copying images swallows errors without returning them to the caller: https://github.com/k3d-io/k3d/blob/852df7786ab5a98b9ecd95e1b215d593cf9201d8/pkg/client/tools.go#L125-L127 which allows the "successfully imported images" result to get printed.

Compare that to this other k3d code to import images into clusters (rather than nodes) which fails if any individual image fails to import: https://github.com/k3d-io/k3d/blob/7b1b416c2298f1aa30950eaae1d2847140ee285a/cmd/image/imageImport.go#L75-L86

Or even this code in the same importWithToolsNode() method that returns errors if any image fails to save: https://github.com/k3d-io/k3d/blob/852df7786ab5a98b9ecd95e1b215d593cf9201d8/pkg/client/tools.go#L114-L116

Should importWithToolsNode() return an error whenever any image installations encounter an error? That would at least mean that image installation errors are reported as overall install errors: that doesn't fix the install errors themselves, but it seems more appropriate than treating them as successes.

heesuk-ahn commented 2 years ago

I am also using m1 and k3d image import fails. :(

my k3d version is 5.4.1.

$ k3d image import foo-my:latest -c local

.
.
.
TRAC[0002] Exec process '[./k3d-tools save-image -d /k3d/images/k3d-local-images-20220504000554.tar foo-my:c82a74353357d2f11f2d0a0543cbdd9367fcd0dd9f78b03cf6fa70cf11bbc3e2]' still running in node 'k3d-local-tools'.. sleeping for 1 second... 

TRAC[0003] Exec process '[./k3d-tools save-image -d /k3d/images/k3d-local-images-20220504000554.tar foo-my:c82a74353357d2f11f2d0a0543cbdd9367fcd0dd9f78b03cf6fa70cf11bbc3e2]' still running in node 'k3d-local-tools'.. sleeping for 1 second... 

.
.
.

DEBU[0029] Exec process in node 'k3d-local-tools' exited with '0' 
INFO[0029] Importing images from tarball '/k3d/images/k3d-local-images-20220504000554.tar' into node 'k3d-local-server-0'... 
DEBU[0029] Executing command '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' in node 'k3d-local-server-0' 
INFO[0029] Importing images from tarball '/k3d/images/k3d-local-images-20220504000554.tar' into node 'k3d-local-agent-0'... 
.
.
.
TRAC[0029] Exec process '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-server-0'.. sleeping for 1 second... 
TRAC[0029] Exec process '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-agent-0'.. sleeping for 1 second... 
TRAC[0030] Exec process '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-server-0'.. sleeping for 1 second... 
TRAC[0030] Exec process '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-agent-0'.. sleeping for 1 second... 
TRAC[0031] Exec process '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-server-0'.. sleeping for 1 second... 
TRAC[0031] Exec process '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-agent-0'.. sleeping for 1 second... 
TRAC[0032] Exec process '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-agent-0'.. sleeping for 1 second... 
TRAC[0032] Exec process '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-server-0'.. sleeping for 1 second... 
TRAC[0033] Exec process '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-agent-0'.. sleeping for 1 second... 
TRAC[0033] Exec process '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-server-0'.. sleeping for 1 second... 
TRAC[0034] Exec process '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-server-0'.. sleeping for 1 second... 
TRAC[0034] Exec process '[ctr image import /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-agent-0'.. sleeping for 1 second... 
.
.
.
ERRO[0035] failed to import images in node 'k3d-local-agent-0': Exec process in node 'k3d-local-agent-0' failed with exit code '1' 
ERRO[0035] failed to import images in node 'k3d-local-server-0': Exec process in node 'k3d-local-server-0' failed with exit code '1' 
INFO[0035] Removing the tarball(s) from image volume... 
DEBU[0035] Executing command '[rm -f /k3d/images/k3d-local-images-20220504000554.tar]' in node 'k3d-local-tools' 
TRAC[0035] Exec process '[rm -f /k3d/images/k3d-local-images-20220504000554.tar]' still running in node 'k3d-local-tools'.. sleeping for 1 second... 
DEBU[0036] Exec process in node 'k3d-local-tools' exited with '0' 
INFO[0036] Removing k3d-tools node...                   
DEBU[0036] Deleting node k3d-local-tools ...            
TRAC[0036] [Docker] Deleted Container k3d-local-tools   
INFO[0036] Successfully imported image(s)               
INFO[0036] Successfully imported 1 image(s) into 1 cluster(s) 

and then, when I deployed this image to k3d clsuter, I got error Error: failed to create containerd container: error unpacking image: failed to resolve rootfs: content digest sha256:c82a74353357d2f11f2d0a0543cbdd9367fcd0dd9f78b03cf6fa70cf11bbc3e2: not found

also my docker image architecture "Architecture": "amd64" by building paketo (buildpack).

it's well running in docker desktop but just not work k3d cluster.

iwilltry42 commented 2 years ago

@ethanttbui , @CodingCanuck & @heesuk-ahn , please follow along in the new issue #1072

yoca94 commented 2 years ago

I have the same problem... Any update on this topic ?

henriquevcosta commented 1 year ago

To add some info, I have this happening to me and executed the following (some stuff altered for privacy):

The output I got was

unpacking my-registry.me.com/app/app:2.8.3 (sha256:9379e04f6e56bf94db2d35f429dbf98cdcf8150a719b98face34981cec3ec23b)...ctr: content digest sha256:889bf72e765011f62f49d586fa4e24d42a865b11a676e612b162c24e9448181b: not found

In my mind this at least points into the tarball not being correctly produced, so I decided to compare the output of k3d-tools save with something else.

Saving the image directly in my mac and then copying it into the volume and manually importing (ctr image import) failed with the same error. Then I tried saving the image on my mac again but this time using the hash reference (e.g. docker save -o output.tar 2d13c582c25d) and that was imported successfully.

When I inspected the hash-saved-image tar, did sha1sums on all the contents and then found only 2 differences comparing to the one saved by k3d:

  1. The tar from my mac did NOT have a "repositories" file
  2. the tar from my mac had in the manifest.json "RepoTags":null while the one in k3d save had "RepoTags":["my-registry.me.com/app/app:2.8.3"]

I then exec-ed into the tools node again, mde those 2 modifications (delete repositories and set RepoTags to null) and placed a tar of the outcome in /k3d/images/test1.tar.

Finally, I did docker exec -it k3d-local-server-0 ctr image import /k3d/images/test1.tar and that succeeded.

Now, I haven't enough knowledge about the image format to understand why this is causing this to fail, but at least it seems to be the cause. Happy to provide more info or do more tests, this seems quite frequent on my end.

Edit:

Although ctr imports the image, when I run ctr i list inside the server it prints an error message at the top:

ERRO[0000] failed resolving platform for image sha256:2d13c582c25d67b636cd6289ec75b79e746c73c08c6d96d2deff17a4c55ea492  error="content digest sha256:2d13c582c25d67b636cd6289ec75b79e746c73c08c6d96d2deff17a4c55ea492: not found"
REF

The image is amd64.

henriquevcosta commented 1 year ago

@iwilltry42 how do we reopen this? Still observing this on 5.4.6. Thakns

kathleenfrench commented 1 year ago

ran into this as well, thought i'd share some more findings.

context

very relevant fact here is that i'm running this on an M1 mac.

k3d version v5.4.4
k3s version v1.23.8-k3s1 (default)

/ # ctr version
Client:
  Version:  v1.5.13-k3s1
  Revision:
  Go version: go1.17.5

Server:
  Version:  v1.5.13-k3s1
  Revision:
  UUID: dc205011-e667-416a-9c8e-c7ba88eb82c8

problem

fwiw i suspect this issue is related: https://github.com/containerd/containerd/issues/6441, particularly the in-depth explanation here (https://github.com/containerd/containerd/issues/6441#issuecomment-1098609359).

So import is matching both linux/amd64 and linux/386, but since the image was pulled and exported for only the linux/amd64 platform, import cannot find the necessary content for linux/386 platform.

Since there seems to be inconsistency between ctr import and ctr export as far as platforms goes, which would be correct? In this case, the image is exported for just linux/amd64 but import expects linux/amd64 and linux/386.

fixes were identified for this and merged to containerd here (https://github.com/containerd/containerd/pull/6906) at the end of august and the beginning of november (https://github.com/containerd/containerd/pull/7615).

i tried updating k3d to see if i could get a release of ctr that includes the fixes -

k3d version v5.4.6
k3s version v1.24.4-k3s1 (default)

/ # ctr version
Client:
  Version:  v1.6.6-k3s1
  Revision:
  Go version: go1.18.1

Server:
  Version:  v1.6.6-k3s1
  Revision:
  UUID: 5afae041-3b63-41d7-8dc3-c011fcc6390d

unfortunately i still got the same errors after updating and following the steps outlined below:

{&ContainerStateWaiting{Reason:CreateContainerError,Message:failed to create containerd container: error unpacking image: failed to resolve rootfs: content digest sha256:767c499cb2f8d13b940afeb98edd0fe91505b0d6993a4820b0a1fc9a58a11cb2: not found,} nil nil} 

it looks like containerd 1.6.6 was released in june, which would predate the above fixes so that makes sense then.

update: bumped the rancher/k3s image version up in my cluster config file to https://github.com/k3s-io/k3s/releases/tag/v1.24.7%2Bk3s1 which includes containerd v1.6.8-k3s1. i think the PRs including the fix for this would be covered by that, but it's possible i've misread that...either way, still no luck - k3d image import doesn't work for this version either, and when doing it manually the digests still seem to be subject to the manifest mismatch failed to create containerd container: error unpacking image: failed to resolve rootfs 😢.

findings

i was getting the same error messages as explained above. k3d image imports would fail and/or attempts to deploy would return an image pull error

ERRO[0030] failed to import images in node 'k3d-local-agent-2': Exec process in node 'k3d-local-agent-2 failed with exit code '1'
INFO[0030] Removing k3d-tools node...
INFO[0030] Successfully imported image(s)
INFO[0030] Successfully imported 1 image(s) into 1 cluster(s)

manual import approach

given the aforementioned issue, i wanted to confirm whether the problem was in fact stemming from discrepancies in underlying architecture. first i ran an image import to ensure the tarball persists locally:

k3d image import --cluster local-cluster-name --trace gcr.io/private-regstry/app:2.8.3 --keep-tarball

i made a note of the logs that show the full ctr cmd being run to import the image in the trace logs, i.e.

TRAC[0028] Exec process '[ctr image import /k3d/images/local-cluster-name-xxx-images-20221202162620.tar]' still running in node 'local-cluster-name-agent-2'.. sleeping for 1 second...

i then ran docker exec to get a shell in the 'master node' container and cd'd into /k3d/images

here, i re-ran the ctr image import command but made sure to include the --all-platforms flag.

from the help text: --all-platforms imports content for all platforms, false by default

/k3d/images # ctr image import --all-platforms local-cluster-name-images-20221202162620.tar
unpacking gcr.io/xxxx/xxxx:(sha256:xxxxx)...done

unlike before, now it unpacks - little victories 😄...

i then ran ctr images list | grep 'your-image-name' to confirm it unpacked. that, unfortunately, is where the success ends going this route. though you can kick off a deployment and get past the prior error(s) related to the image not being pull-able, you will then hit an error similar to the following:

 - dev:pod/app-6dc6fc866-lm2w6: container xxx in error: &ContainerStateWaiting{Reason:CreateContainerError,Message:failed to create containerd container: error unpacking image: failed to resolve rootfs: content digest sha256:767c499cb2f8d13b940afeb98edd0fe91505b0d6993a4820b0a1fc9a58a11cb2: not found,}

when running ctr images list to get the digest for the same problematic image, as you may have guessed the digests are different:

gcr.io/xxxx/xxx      application/vnd.docker.distribution.manifest.v2+json      sha256:19cbc744b63eb4b1447401c775dc33a09f141f09b9a0c29632e008ead05c8e43 493.0 MiB linux/amd64   

the above error (failed to resolve rootfs: content digest) is referenced in the following issues https://github.com/containerd/containerd/issues/1498 and https://github.com/containerd/containerd/pull/1506, which concern a feature long-released in containerd for multi-arch unpacking. the problem here doesn't seem to be one of lack of support for unpacking multi-arch builds, though, more-so that the digests aren't matching.

i noticed ctr images import has a --digests flag:

--digests            whether to create digest images (default: false)

so that seemed like the next thing to try...

docker save approach

since k3d image import doesn't offer a way to include all-platforms or digests, i searched for whether there was a workaround for that. (credit to https://github.com/kubernetes-sigs/kind/issues/2402#issuecomment-1056734295 for pointing me in this direction)

before doing this it's necessary to delete the old attempts in the local registry if you tried manually unpacking it via ctr on the node, etc.

to import an image, run the following instead of k3d image import

docker save gcr.io/example-repo/image | docker exec --privileged -i k3d-local-server-node-example ctr --namespace=k8s.io images import --all-platforms --digests --snapshotter=overlayfs -

this is almost the same approach as what i was doing above, except i'm skipping the k3d image import step and using docker save and added the --digests and --snapshotter flags.

in theory, this ought to make your image(s) available and i suspect the above could even function as a workaround for some, but that assumes the source image has multi-platform builds available to be saved in the first place.

unfortunately, this was not the case for me as the image(s) in question don't currently have builds compatible with the underlying architecture of the k3d 'nodes'. if this is you as well, you can expect to get a back-to-square-one error of:

DEBU[0046] marking resource failed due to error code STATUSCHECK_IMAGE_PULL_ERR  subtask=-1 task=Deploy
 - dev:deployment/xxx: container xx is waiting to start: gcr.io/example-repo/x can't be pulled

the workaround to the workaround might be rebuilding the affected image(s) with --platform flag, i.e. docker build --platform <whatever the output of $(uname -m) is on a k3d node>, but i've not had an opportunity to try this yet.