k3d-io / k3d

Little helper to run CNCF's k3s in Docker
https://k3d.io/
MIT License
5.41k stars 460 forks source link

[Bug] Cannot delete cluster: ` tried to kill container, but did not receive an exit event ` #982

Open iwilltry42 opened 2 years ago

iwilltry42 commented 2 years ago

@iwilltry42 I think I pasted too narrow log entry. Let me pardon you ! I am trying to delete the cluster with 2 pods: nfs-server and nfs-client:

INFO[0000] Deleting cluster 'buildsignerworker-63'      
ERRO[0012] docker failed to remove the container 'k3d-buildsignerworker-63-server-0': Error response from daemon: Could not kill running container 774281fb7dbc92f4da5f766d1d2d0dffea02face10eb1a79e95b7067cfeccf2c, cannot remove - tried to kill container, but did not receive an exit event 
INFO[0012] Deleting cluster network 'k3d-buildsignerworker-63' 
INFO[0012] Deleting 2 attached volumes...               
WARN[0012] Failed to delete volume 'k3d-buildsignerworker-63-images' of cluster 'docker failed to delete volume 'k3d-buildsignerworker-63-images': Error response from daemon: remove k3d-buildsignerworker-63-images: volume is in use - [774281fb7dbc92f4da5f766d1d2d0dffea02face10eb1a79e95b7067cfeccf2c]': buildsignerworker-63 -> Try to delete it manually 
WARN[0012] Failed to delete volume 'k3d-buildsignerworker-63-images' of cluster 'docker failed to delete volume 'k3d-buildsignerworker-63-images': Error response from daemon: remove k3d-buildsignerworker-63-images: volume is in use - [774281fb7dbc92f4da5f766d1d2d0dffea02face10eb1a79e95b7067cfeccf2c]': buildsignerworker-63 -> Try to delete it manually 
INFO[0012] Removing cluster details from default kubeconfig... 
INFO[0012] Removing standalone kubeconfig file (if there is one)... 
INFO[0012] Successfully deleted cluster buildsignerworker-63! 
FATA[0000] Failed to create cluster 'buildsignerworker-63' because a cluster with that name already exists 
make: *** [Makefile:174: ci-mr-test] Error 1

Originally posted by @huberts90 in https://github.com/k3d-io/k3d/issues/932#issuecomment-1044037322

Also:

https://github.com/k3d-io/k3d/issues/932#issuecomment-1044074432:

It is on my machine but I am going to use this stuff in CI too. Let me share the details below:

$ k3d version
k3d version v5.3.0
k3s version v1.22.6-k3s1 (default)
$ docker version
Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:33 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:43:42 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

OS

20.04.1-Ubuntu
iwilltry42 commented 2 years ago

@huberts90, it doesn't look like a bug in k3d, but rather something fishiy with either docker or your system setup. Does this happen every time you try to delete a cluster? Does a docker service restart help? Can you kill that container manually (docker rm -f k3d-buildsignerworker-63-server-0)?

huberts90 commented 2 years ago

I can not delete this container manually. I get the same error as seen in the cluster's log:

ERRO[0012] docker failed to remove the container 'k3d-buildsignerworker-63-server-0': Error response from daemon: Could not kill running container 774281fb7dbc92f4da5f766d1d2d0dffea02face10eb1a79e95b7067cfeccf2c, cannot remove - tried to kill container, but did not receive an exit event 

This issue happens only if setup contains nfs pod ( I am using erichough/nfs-server:2.2.1) and the client pod mounts nfs volume e.q.

  containers:
    - name: nfs-client
      image: alpine:3.14
      imagePullPolicy: IfNotPresent
      securityContext:
        privileged: true
      volumeMounts:
      - name: nfs
        mountPath: "/mnt"
  volumes:
  - name: nfs
    persistentVolumeClaim:
      claimName: nfs-claim
      readOnly: false
iwilltry42 commented 2 years ago

@huberts90 , so it's not a k3d issue then. Interesting nonetheless, so let me try to help you there.. Can you please paste both manifests that you have for the NFS server and client so I can try to replicate your issue? Really weird, since killing the node container shouldn't be influenced in any way by what is running inside of the cluster inside the container :thinking: Can you see any logs from the container?

huberts90 commented 2 years ago

@iwilltry42 I hope this repository will be enough: https://github.com/huberts90/go-k3d-nfs/tree/main/k8s

iwilltry42 commented 2 years ago

@huberts90 , that setup requires some host-level configuration which I didn't want to do for the quick test. However, the "official" nfs server and provisioner https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner/tree/master/deploy/kubernetes worked just fine for me (I chose to install the pod, as that directory offers pod/statefulset/deployment). NFS worked just fine and the cluster was deleted without any issues as well :+1: