Open mdshack opened 1 year ago
I am having the same problem. Was working fine up until about a week ago.
I have also been hit with this issue. Cluster is up on AWS. Builds fine with all pods running. When EC2 instance is rebooted most of the pods enter a crash loop.
ubuntu@ip-10-1-1-102:~$ k3d runtime-info arch: x86_64 cgroupdriver: cgroupfs cgroupversion: "1" endpoint: /var/run/docker.sock filesystem: extfs name: docker os: Ubuntu 20.04.5 LTS ostype: linux version: 20.10.12
ubuntu@ip-10-1-1-102:~$ k3d version k3d version v5.4.7 k3s version v1.25.6-k3s1 (default)
ubuntu@ip-10-1-1-102:~$ docker info Client: Context: default Debug Mode: false
Server: Containers: 5 Running: 5 Paused: 0 Stopped: 0 Images: 3 Server Version: 20.10.12 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: runc version: init version: Security Options: apparmor seccomp Profile: default Kernel Version: 5.15.0-1028-aws Operating System: Ubuntu 20.04.5 LTS OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 61.81GiB Name: ip-10-1-1-102 ID: KB2D:4ZRM:F5IX:G6ZP:JVCV:ORCW:D3FO:GG4J:N5RH:ZJVR:J2QL:TAPO Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
Trying to use Kind instead.
Has anyone found a workaround for this yet? its a pain starting and stopping the cluster to get this back. Any way we can query the k3d instance to find out what it ought to be?
Has anyone found a workaround for this yet? its a pain starting and stopping the cluster to get this back. Any way we can query the k3d instance to find out what it ought to be?
Did you happen to find a workaround for this?
We're experiencing the same problem, our developers locally have to stop and start the cluster every time they reboot their machines to fix DNS resolution.
Has anyone found a workaround for this yet? its a pain starting and stopping the cluster to get this back. Any way we can query the k3d instance to find out what it ought to be?
Did you happen to find a workaround for this?
We're experiencing the same problem, our developers locally have to stop and start the cluster every time they reboot their machines to fix DNS resolution.
I wrote a powershell function for our developers to run to fix the issue
function Repair-ClusterCoreDns()
{
$servero = docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' k3d-energy-server-0
$serverlb = docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' k3d-energy-serverlb
$registry = docker inspect --format='{{with index .NetworkSettings.Networks \"k3d-energy\"}}{{.IPAddress}}{{end}}' k3d-myregistry.localhost
$hostK3dInternal = Get-HostK3dInternal
$ips = "|
$hostK3dInternal host.k3d.internal
$servero k3d-energy-server-0
$serverlb k3d-energy-serverlb
$registry k3d-myregistry.localhost
"
$patch = 'data:
NodeHosts: ' + $ips
Write-Output "Adding following members to the coredns config map:"
Write-Output $patch
kubectl patch configmap/coredns -n kube-system --type merge --patch $patch
}
Set-Alias -Name fixdns -Value Repair-ClusterCoreDns -Force
Export-ModuleMember -Function Repair-ClusterCoreDns -Alias fixdns
function Get-HostK3dInternal()
{
$hostIp = ""
$dnsEntries = docker exec k3d-energy-tools /bin/sh -c "getent ahostsv4 host.k3d.internal"
foreach ( $dnsEntry in $dnsEntries) {
$chunks = $dnsEntry.Split(" ") | Where-Object {$_}
if($chunks[2] -eq "host.k3d.internal")
{
$hostIp = $chunks[0]
}
}
if($hostIp -eq "")
{
Write-Host 'FAILURE: Could not resolve host.k3d.internal, Please ensure k3d-energy-tools container is running'
}
return $hostIp
}
its not perfect but it works, the important line to figure out what the host.k3d.internal ip should be is
docker exec k3d-energy-tools /bin/sh -c "getent ahostsv4 host.k3d.internal"
Which I only figured out reading the source and, as its undocumented, is liable to change. but it works for now
This is also a problem with local registries and it can easily be replicated. Local registries break when used from the cluster after a cluster restart.
coredns
ConfigMap
has proper entriescoredns
ConfigMap
is missing the entries
What did you do
How was the cluster created?
k3d cluster create [clustername] --volume ... --volume ... --registry-config [local path to registry config] --agents 1 --servers 1 --port 8081:8081@loadbalancer --port ... --port ... --port ... --k3s-arg --disable=traefik@server:0 --k3s-arg --disable=metrics-server@server:0
What did you do afterwards?
docker exec -it k3d-relay-agent-0 /bin/sh
wget host.k3d.internal:5000
[refer toScreenshots or terminal output -> Successful wget
below]docker exec -it k3d-relay-agent-0 /bin/sh
wget host.k3d.internal:5000
[refer toScreenshots or terminal output -> Unsuccessful wget
below]k3d cluster stop [clustername]
k3d cluster start [clustername]
What did you expect to happen
Expect
host.k3d.internal:5000
to be reachable on machine restartScreenshots or terminal output
If applicable, add screenshots or terminal output (code block) to help explain your problem.
Successful wget
Unsuccessful wget
Which OS & Architecture
k3d runtime-info
Which version of
k3d
k3d version
Which version of docker
docker version
anddocker info
Server: Docker Engine - Community Engine: Version: 20.10.23 API version: 1.41 (minimum version 1.12) Go version: go1.18.10 Git commit: 6051f14 Built: Thu Jan 19 17:34:14 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.15 GitCommit: 5b842e528e99d4d4c1686467debf2bd4b88ecd86 runc: Version: 1.1.4 GitCommit: v1.1.4-0-g5fd4c4d docker-init: Version: 0.19.0 GitCommit: de40ad0
Client: Context: default Debug Mode: false Plugins: app: Docker App (Docker Inc., v0.9.1-beta3) buildx: Docker Buildx (Docker Inc., v0.10.0-docker) compose: Docker Compose (Docker Inc., v2.15.1) scan: Docker Scan (Docker Inc., v0.23.0)
Server: Containers: 11 Running: 6 Paused: 0 Stopped: 5 Images: 56 Server Version: 20.10.23 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: 5b842e528e99d4d4c1686467debf2bd4b88ecd86 runc version: v1.1.4-0-g5fd4c4d init version: de40ad0 Security Options: apparmor seccomp Profile: default Kernel Version: 5.15.0-60-generic Operating System: Ubuntu 20.04.5 LTS OSType: linux Architecture: x86_64 CPUs: 16 Total Memory: 62.42GiB Name: .... ID: ... Docker Root Dir: /var/lib/docker Debug Mode: false Username: ... Registry: ... Labels: Experimental: false Insecure Registries: ...:5001 ....:5002 localhost:32000 ....:5003 ...:5000 127.0.0.0/8 Registry Mirrors: ...:5001/ Live Restore Enabled: false