k3d-io / k3d

Little helper to run CNCF's k3s in Docker
https://k3d.io/
MIT License
5.34k stars 456 forks source link

[BUG] error adding hosts file entry for <nil>:[host.k3d.internal]: "<nil>" is an invalid IP address #1018

Closed ololobus closed 2 years ago

ololobus commented 2 years ago

What did you do

I have a Docker network zenith_net which is created by docker-compose using the following config:

networks:
  zenith_net:
    name: zenith_net
    ipam:
      config:
        - subnet: "172.20.42.0/24"

Then I try to create a cluster:

k3d cluster create --network zenith_net zenith-local

And get an error:

INFO[0018] Injecting records for hostAliases (incl. host.k3d.internal) and for 7 network members into CoreDNS configmap... 
ERRO[0018] Failed Cluster Start: error during post-start cluster preparation: error while rewriting /etc/hosts in k3d-zenith-local-server-0: error adding hosts file entry for <nil>:[host.k3d.internal]: "<nil>" is an invalid IP address 
INFO[0018] Deleting cluster 'zenith-local'              
ERRO[0018] Failed to create cluster >>> Rolling Back 

It first reproduced a couple of times on my local machine, but the problem disappeared after reboot. And now it is stably reproducible in GitHub Actions CI.

What did you expect to happen

Cluster is created without errors.

Screenshots or terminal output

DEBU[0006] Created container k3d-zenith-local-serverlb (ID: 9122cdd132bfb7a73ff921b890c11cb7a7de710a83e3cacd6ac53f7973b14a51) 
DEBU[0006] Created loadbalancer 'k3d-zenith-local-serverlb' 
DEBU[0006] DOCKER_SOCK=/var/run/docker.sock             
INFO[0006] Using the k3d-tools node to gather environment information 
DEBU[0006] no netlabel present on container /k3d-zenith-local-tools 
DEBU[0006] failed to get IP for container /k3d-zenith-local-tools as we couldn't find the cluster network 
DEBU[0006] DOCKER_SOCK=/var/run/docker.sock             
INFO[0006] HostIP: using network gateway <nil> address  
INFO[0006] Starting cluster 'zenith-local'              
INFO[0006] Starting servers...                          
DEBU[0006] Deleting node k3d-zenith-local-tools ...     
DEBU[0006] DOCKER_SOCK=/var/run/docker.sock             
DEBU[0006] No fix enabled.                              
DEBU[0006] Node k3d-zenith-local-server-0 Start Time: 2022-03-15 17:38:31.904344702 +0000 UTC m=+6.190021861 
INFO[0006] Starting Node 'k3d-zenith-local-server-0'    
DEBU[0006] Truncated 2022-03-15 17:38:32.302305822 +0000 UTC to 2022-03-15 17:38:32 +0000 UTC 
DEBU[0006] Waiting for node k3d-zenith-local-server-0 to get ready (Log: 'k3s is up and running') 
DEBU[0011] Finished waiting for log message 'k3s is up and running' from node 'k3d-zenith-local-server-0' 
INFO[0011] All agents already running.                  
INFO[0011] Starting helpers...                          
DEBU[0011] Node k3d-zenith-local-serverlb Start Time: 2022-03-15 17:38:37.63219131 +0000 UTC m=+11.917868569 
INFO[0012] Starting Node 'k3d-zenith-local-serverlb'    
DEBU[0012] Truncated 2022-03-15 17:38:38.308854894 +0000 UTC to 2022-03-15 17:38:38 +0000 UTC 
DEBU[0012] Waiting for node k3d-zenith-local-serverlb to get ready (Log: 'start worker processes') 
DEBU[0018] Finished waiting for log message 'start worker processes' from node 'k3d-zenith-local-serverlb' 
DEBU[0018] Found network {Name:zenith_net ID:427415b8c930849a2febd2b8fe1c69f63aa901fdc72d4460025a537e1c857882 Created:2022-03-15 17:37:50.805895355 +0000 UTC Scope:local Driver:bridge EnableIPv6:false IPAM:{Driver:default Options:map[] Config:[{Subnet:172.54.32.0/24 IPRange: Gateway: AuxAddress:map[]}]} Internal:false Attachable:true Ingress:false ConfigFrom:{Network:} ConfigOnly:false Containers:map[0581f851037256c2e0e6c666639177db879387a6d09aece7a3c5d972850dde56:{Name:console_pageserver_1 EndpointID:0b080976501a3d4516cc0ac6d5e8b62f4c2409851ef6651d78d8a66ce88597c7 MacAddress:02:42:ac:36:20:0b IPv4Address:172.54.32.11/24 IPv6Address:} 2718ea8f2291a329eae67dcea2cbf43a36e7b23c7e30bd469128e90b242c734e:{Name:k3d-zenith-local-server-0 EndpointID:30f11b96d66a4943c11a166cea00dd41fef3f7f0545c2dfa43c1ab3057a8bfdc MacAddress:02:42:ac:36:20:03 IPv4Address:172.54.32.3/24 IPv6Address:} 731a71ab8f676a90944c936e0d3ff83973ce614d2d3a1f16be32b3316d6c468f:{Name:console_wal_acceptor3_1 EndpointID:1667488ecd001c36c18678e81d627b47d3839a229e8d7d2d494f94da873291c5 MacAddress:02:42:ac:36:20:17 IPv4Address:172.54.32.23/24 IPv6Address:} 7bdd3a10d8183faa507ef440e6d0faadce57c29be6b135e3ae6802b94fedb5be:{Name:console_wal_acceptor1_1 EndpointID:1a8909119bb13a00cb0b3daa48a1133bbf3c81cd38efaf28b9183af1426892dd MacAddress:02:42:ac:36:20:15 IPv4Address:172.54.32.21/24 IPv6Address:} 9122cdd132bfb7a73ff921b890c11cb7a7de710a83e3cacd6ac53f7973b14a51:{Name:k3d-zenith-local-serverlb EndpointID:cfdd1f8343227cb86b6c01c458b8509cf7b5f6505f072375d5f9cee15ecae6a8 MacAddress:02:42:ac:36:20:02 IPv4Address:172.54.32.2/24 IPv6Address:} b04c477a42edd730cd021b51258ab234a39d1e51fcf00439b593349dacb2003a:{Name:console_proxy_1 EndpointID:37b6848f5f37c2b8115ec590190fa7eda1eef90f9538c4e0fa92f090fcecea23 MacAddress:02:42:ac:36:20:1f IPv4Address:172.54.32.31/24 IPv6Address:} e1943501fc539f98b9bccdf20a0a12787ab98e8d7373f8b52b1130a54e53aeb7:{Name:console_wal_acceptor2_1 EndpointID:70a6f1e4e6716039367ebebf9b0d134ba99bc7c372eba1a189bb0508f2c8225a MacAddress:02:42:ac:36:20:16 IPv4Address:172.54.32.22/24 IPv6Address:}] Options:map[] Labels:map[com.docker.compose.network:zenith_net com.docker.compose.project:console com.docker.compose.version:1.27.4] Peers:[] Services:map[]} 
INFO[0018] Injecting records for hostAliases (incl. host.k3d.internal) and for 7 network members into CoreDNS configmap... 
ERRO[0018] Failed Cluster Start: error during post-start cluster preparation: error while rewriting /etc/hosts in k3d-zenith-local-server-0: error adding hosts file entry for <nil>:[host.k3d.internal]: "<nil>" is an invalid IP address 
INFO[0018] Deleting cluster 'zenith-local'              
ERRO[0018] Failed to create cluster >>> Rolling Back    
DEBU[0018] Cluster Details: &{Name:zenith-local Network:{Name:zenith_net ID:427415b8c930849a2febd2b8fe1c69f63aa901fdc72d4460025a537e1c857882 External:true IPAM:{IPPrefix:172.54.32.0/24 IPsUsed:[172.54.32.1 172.54.32.23 172.54.32.21 172.54.32.31 172.54.32.22 172.54.32.11] Managed:false} Members:[0xc00014d980 0xc00014d9b0 0xc00014d9e0 0xc00014da10 0xc00014da40]} Token:sirUMxbhmFUUxQXskWQA Nodes:[0xc0000f64e0 0xc0000f6680] InitNode:<nil> ExternalDatastore:<nil> KubeAPI:0xc0004fb080 ServerLoadBalancer:0xc0001363b0 ImageVolume:k3d-zenith-local-images Volumes:[k3d-zenith-local-images k3d-zenith-local-images]} 
DEBU[0018] Deleting node k3d-zenith-local-serverlb ...  
DEBU[0018] Deleting node k3d-zenith-local-server-0 ...  
DEBU[0019] Skip deletion of cluster network 'zenith_net' because it's managed externally 
INFO[0019] Deleting 2 attached volumes...               
DEBU[0019] Deleting volume k3d-zenith-local-images...   
DEBU[0019] Deleting volume k3d-zenith-local-images...   
WARN[0019] Failed to delete volume 'k3d-zenith-local-images' of cluster 'failed to find volume 'k3d-zenith-local-images': Error: No such volume: k3d-zenith-local-images': zenith-local -> Try to delete it manually 
FATA[0019] Cluster creation FAILED, all changes have been rolled back! 

Which OS & Architecture

Linux x86_64:

Which version of k3d

Locally and in CI:

k3d version v5.3.0
k3s version v1.22.6-k3s1 (default)

Which version of docker

Locally:

Docker version 20.10.7, build 20.10.7-0ubuntu5.1

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 8
  Running: 7
  Paused: 0
  Stopped: 1
 Images: 5
 Server Version: 20.10.7
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 
 runc version: 
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.15.15-76051515-generic
 Operating System: Pop!_OS 21.10
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.14GiB
 Name: pop-os
 ID: L47N:KC7R:PYKX:GLNR:HII5:JFMR:ERVB:UBCP:E35G:MCPD:J5BY:VFD3
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: zenithdb
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

In CI:

Docker version 20.10.11+azure-3, build dea9396e184290f638ea873c76db7c80efd5a1d2

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., 0.7.1+azure-2)
  compose: Docker Compose (Docker Inc., 2.2.3+azure-1)

Server:
 Containers: 6
  Running: 6
  Paused: 0
  Stopped: 0
 Images: 22
 Server Version: 20.10.11+azure-3
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: f46b6ba2c9314cfc8caae24a32ec5fe9ef1059fe
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.11.0-1028-azure
 Operating System: Ubuntu 20.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 6.785GiB
 Name: fv-az183-217
 ID: DFUV:3MIO:QJB4:SFAI:CWPP:LEBI:ETH4:AT3U:T5PZ:KRMI:AS3I:HDYW
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: githubactions
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
ololobus commented 2 years ago

It looks like k3d cannot figure out the host ip address in the specified subnet, so it cannot define host.k3d.internal alias.

iwilltry42 commented 2 years ago

Hi @ololobus , thanks for opening this issue! This is an edge case I haven't seen yet, but I'll look into this! I guess k3d should at least fail if it cannot determine the correct address to use there :thinking: So somehow the network created via docker-compose doesn't seem to have the gateway set correctly? :thinking: Can you give more details on your setup? Or post the output of docker network inspect zenith_net? I can try to reproduce this later for debugging purposes.

mjnagel commented 2 years ago

I am seeing the same issue with 5.3.0 (not observed in 5.2.2). I setup the network via a docker cli command:

docker network create <name> --driver=bridge -o "com.docker.network.driver.mtu"="1450" --subnet=172.20.0.0/16

My k3d config uses values in that IP range for cluster cidr, service cidr, and cluster dns (in addition to --network <name>).

ololobus commented 2 years ago

So somehow the network created via docker-compose doesn't seem to have the gateway set correctly?

@iwilltry42, it seems it has, here is an output of docker network inspect zenith_net:

[
    {
        "Name": "zenith_net",
        "Id": "3c4221a7591171eb42dde89802e1ea13945daf1a1443039eddc9a166cfa2c467",
        "Created": "2022-03-16T18:29:42.962296627+03:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.20.42.0/24",
                    "Gateway": "172.20.42.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "6ee06f6c34f22f504c45d15dbebaa7c85256425bba760a7abd0f9d46a443c82b": {
                "Name": "young-forest-671487",
                "EndpointID": "d709edd51058e1fff125d3affcd5c72d1e8e5fb7a410135e70e2f18be7ec4553",
                "MacAddress": "02:42:ac:14:2a:04",
                "IPv4Address": "172.20.42.4/24",
                "IPv6Address": ""
            },
            "b4d0bcd4f3c7c455880bb0f4b35903e3a3f90898a4d1b1228966077043f273bf": {
                "Name": "royal-bread-675489",
                "EndpointID": "9380eb4de34b963aba754fec6df3ca63cbc221345ac9d852339d4428d63bcbef",
                "MacAddress": "02:42:ac:14:2a:02",
                "IPv4Address": "172.20.42.2/24",
                "IPv6Address": ""
            },
            "e75f9e40f5242447ba30ae2edc832480337cf126ce4439bb6fd29496d59d16fb": {
                "Name": "misty-block-624433",
                "EndpointID": "7f3623d90beb838ef6ee15db457701639966057cb0b31208fb036ef7d9538a99",
                "MacAddress": "02:42:ac:14:2a:03",
                "IPv4Address": "172.20.42.3/24",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {
            "com.docker.compose.network": "zenith_net",
            "com.docker.compose.project": "console",
            "com.docker.compose.version": "1.27.4"
        }
    }
]
iwilltry42 commented 2 years ago

This is weird.. I just tried your commands and in fact in my setup, the gateway is missing :thinking:

$ docker network create foonet --driver=bridge -o "com.docker.network.driver.mtu"="1450" --subnet=172.20.0.0/16
ab0aac890c2fd5ec4c41ea64c73305dfdf5e5a7ea752e8e15bd671ce503b0587

$ docker network inspect foonet                                                                                
[
    {
        "Name": "foonet",
        "Id": "ab0aac890c2fd5ec4c41ea64c73305dfdf5e5a7ea752e8e15bd671ce503b0587",
        "Created": "2022-03-23T16:39:21.843406633+01:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.20.0.0/16"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {},
        "Options": {
            "com.docker.network.driver.mtu": "1450"
        },
        "Labels": {}
    }
]

UPDATE: Supplying the --gateway flag when creating the network fixes it, e.g. --gateway 172.20.0.1. I guess there's a similar option for the compose spec. Also, according to some google results, restarting docker will add some default gateway to the network (no clue why it's not there from the start) :thinking:

ololobus commented 2 years ago

Ugh, my network describing was misleading, as it was from my local machine, when attaching the existing network in k3d worked fine. Also I've surely restarted Docker daemon after the network creation.

Yeah, for docker-compose there is a gateway option as well, so the config should look like:

networks:
  zenith_net:
    name: zenith_net
    ipam:
      config:
        - subnet: "172.20.42.0/24"
          gateway: 172.20.42.1

I'll try watching the gateway value and maybe try to run with and without it in CI.

iwilltry42 commented 2 years ago

k3d should now error out if there's no gateway defined, as it's a hard requirement to set host.k3d.internal, where host.docker.internal is not present.

I guess your main issue is fixed by setting the gateway IP, but feel free to reopen this if you need further work from k3d's side :+1:

mjnagel commented 2 years ago

Thanks for the followup @iwilltry42 thats a good find - will have to set that on my end when making the docker network.