k3d-io / k3d

Little helper to run CNCF's k3s in Docker
https://k3d.io/
MIT License
5.46k stars 461 forks source link

[BUG] Create/start-start-stop-start result in failure #1006

Open gourvy opened 2 years ago

gourvy commented 2 years ago

What did you do

What did you expect to happen

There should be no error. Also, I believe that the -tools container should have be deleted after every start.

Screenshots or terminal output

root@ubuntu:~# k3d cluster create toto
INFO[0000] Prep: Network                                
INFO[0000] Created network 'k3d-toto'                   
INFO[0000] Created image volume k3d-toto-images         
INFO[0000] Starting new tools node...                   
INFO[0000] Starting Node 'k3d-toto-tools'               
INFO[0001] Creating node 'k3d-toto-server-0'            
INFO[0001] Creating LoadBalancer 'k3d-toto-serverlb'    
INFO[0001] Using the k3d-tools node to gather environment information 
INFO[0001] HostIP: using network gateway 172.19.0.1 address 
INFO[0001] Starting cluster 'toto'                      
INFO[0001] Starting servers...                          
INFO[0001] Starting Node 'k3d-toto-server-0'            
INFO[0007] All agents already running.                  
INFO[0007] Starting helpers...                          
INFO[0007] Starting Node 'k3d-toto-serverlb'            
INFO[0014] Injecting records for hostAliases (incl. host.k3d.internal) and for 2 network members into CoreDNS configmap... 
INFO[0016] Cluster 'toto' created successfully!         
INFO[0016] You can now use it like this:                
kubectl cluster-info
root@ubuntu:~# k3d cluster start toto
INFO[0001] Using the k3d-tools node to gather environment information 
INFO[0001] Starting new tools node...                   
INFO[0002] Starting Node 'k3d-toto-tools'               
INFO[0003] HostIP: using network gateway 172.19.0.1 address 
INFO[0003] Starting cluster 'toto'                      
INFO[0003] All servers already running.                 
INFO[0003] All agents already running.                  
INFO[0003] All helpers already running.                 
INFO[0003] Started cluster 'toto' 
root@ubuntu:~# docker container ps -a
CONTAINER ID   IMAGE                      COMMAND                  CREATED          STATUS          PORTS                                                                                               NAMES
031322f4042f   rancher/k3d-tools:5.3.0    "/app/k3d-tools noop"    7 seconds ago    Up 4 seconds                                                                                                        k3d-toto-tools
fdaced82c246   rancher/k3d-proxy:5.3.0    "/bin/sh -c nginx-pr…"   51 seconds ago   Up 44 seconds   80/tcp, 0.0.0.0:35215->6443/tcp                                                                     k3d-toto-serverlb
80218eba68ee   rancher/k3s:v1.22.6-k3s1   "/bin/k3s server --t…"   51 seconds ago   Up 50 seconds                                                                                                       k3d-toto-server-0
root@ubuntu:~# k3d cluster stop toto
INFO[0000] Stopping cluster 'toto'                      
INFO[0011] Stopped cluster 'toto'
root@ubuntu:~# k3d cluster start toto
INFO[0000] Using the k3d-tools node to gather environment information 
INFO[0000] Starting existing tools node k3d-toto-tools... 
INFO[0000] Starting Node 'k3d-toto-tools'               
INFO[0000] HostIP: using network gateway 172.19.0.1 address 
INFO[0000] Starting cluster 'toto'                      
INFO[0000] Starting servers...                          
INFO[0000] Starting Node 'k3d-toto-server-0'            
INFO[0005] All agents already running.                  
INFO[0005] Starting helpers...                          
FATA[0005] Failed to add one or more helper nodes: runtime failed to start node 'k3d-toto-tools': failed to get container for node 'k3d-toto-tools': Didn't find container for node 'k3d-toto-tools'

Which OS & Architecture

Which version of k3d

Which version of docker

iwilltry42 commented 2 years ago

Hi @gourvy , thanks for opening this issue!

Also, I believe that the -tools container should have be deleted after every start.

This is 100% true, but it seems like the main functions returns before the goroutine deleting the tools node is finished, thus the node is still there. The error in the end is caused by the fact, that the goroutine deleting the tools node now has enough time to delete it and it's not present anymore, when helper nodes are being started :exploding_head:

Anyway, I just made sure that the tools node gets deleted properly, as the time required to do so is pretty negligible :+1:

tekumara commented 2 years ago

I run into this on k3d v5.4.1:

.venv ❯ k3d cluster start orion  
INFO[0000] Using the k3d-tools node to gather environment information 
INFO[0000] Starting existing tools node k3d-orion-tools... 
INFO[0000] Starting Node 'k3d-orion-tools'              
INFO[0001] Starting new tools node...                   
INFO[0001] Starting Node 'k3d-orion-tools'              
INFO[0003] Starting cluster 'orion'                     
INFO[0003] Starting servers...                          
INFO[0003] Starting Node 'k3d-orion-server-0'           
INFO[0014] All agents already running.                  
INFO[0014] Starting helpers...                          
INFO[0014] Starting Node 'k3d-orion-serverlb'           
INFO[0014] Starting Node 'orion-registry'               
FATA[0014] Failed to add one or more helper nodes: runtime failed to start node 'k3d-orion-tools': failed to get container for node 'k3d-orion-tools': Didn't find container for node 'k3d-orion-tools' 
tekumara commented 2 years ago
$ docker ps
CONTAINER ID   IMAGE                            COMMAND                  CREATED         STATUS          PORTS                                                                             NAMES
7684c5ae7f5f   ghcr.io/k3d-io/k3d-tools:5.4.1   "/app/k3d-tools noop"    4 minutes ago   Up 4 minutes                                                                                      k3d-orion-tools
c0cbba3e9d2d   ghcr.io/k3d-io/k3d-proxy:5.4.1   "/bin/sh -c nginx-pr…"   22 hours ago    Up 18 minutes   0.0.0.0:10001->80/tcp, 0.0.0.0:58268->6443/tcp                                    k3d-ray-serverlb
4057c2632816   rancher/k3s:v1.23.6-k3s1         "/bin/k3d-entrypoint…"   22 hours ago    Up 18 minutes                                                                                     k3d-ray-agent-1
a4c47c99a441   rancher/k3s:v1.23.6-k3s1         "/bin/k3d-entrypoint…"   22 hours ago    Up 18 minutes                                                                                     k3d-ray-agent-0
09e050f006c4   rancher/k3s:v1.23.6-k3s1         "/bin/k3d-entrypoint…"   22 hours ago    Up 18 minutes                                                                                     k3d-ray-server-0
1bf5eab03f0f   44d68381e3bd                     "/bin/sh -c nginx-pr…"   13 days ago     Up 4 minutes    0.0.0.0:9000-9001->9000-9001/tcp, 0.0.0.0:4200->80/tcp, 0.0.0.0:60359->6443/tcp   k3d-orion-serverlb
3082234cca0c   rancher/k3s:v1.22.7-k3s1         "/bin/k3d-entrypoint…"   2 weeks ago     Up 4 minutes                                                                                      k3d-orion-server-0
43a2cb74066f   registry:2                       "/entrypoint.sh /etc…"   2 weeks ago     Up 4 minutes    0.0.0.0:5550->5000/tcp                                                            orion-registry
d6925bd0e10e   registry:2                       "/entrypoint.sh /etc…"   3 weeks ago     Up 18 minutes   0.0.0.0:5555->5000/tcp                                                            registry