Closed mindkeep closed 3 years ago
Hi @mindkeep , thanks for opening this issue! You're right there, we saw this issue already in the context of #262 (https://github.com/rancher/k3d/issues/262#issuecomment-810105498). I'll investigate a bit on what we can do there :+1: (FWIW: old related moby/docker issue: https://github.com/moby/moby/issues/2801)
Log
kind
: https://github.com/kubernetes-sigs/kind/issues/1689 & https://github.com/kubernetes-sigs/kind/issues/2045k3s
: https://github.com/k3s-io/k3s/issues/2965 & https://github.com/k3s-io/k3s/issues/2850If there's no flag for k3s/etcd that we could use to work around this, we'll have to check how feasible it is to manage static IPs for k3d nodes (i.e. similar to using the --ip
flag in docker).
Worth noting, if I shut down both containers and restart k3d-test-k3d-server-0 first, we get the original IP ordering and everything comes back up.
So it's purely a docker IP ordering problem.
Given that we already spin up a new docker network, could we explore assigning more static IPs within that network?
So it's purely a docker IP ordering problem.
Yep..
could we explore assigning more static IPs
Implementing some IPAM in k3d would be the only possible solution, yep :thinking:
Let's turn this into a feature request :+1:
Hi @mindkeep, while I'm still trying to figure out the best way to cope with this, I already did some work on this.
Please give https://github.com/rancher/k3d/releases/tag/v4.5.0-dev.0 a try and start your cluster with --subnet auto
or --subnet 172.45.0.0/16
(or whatever subnet it should use).
With --subnet auto
, k3d will create a fake docker network to get a subnet auto-assigned by docker that it can use.
Also please make use of the --trace
flag for verbose logs.
Awaiting your feedback :)
@iwilltry42
First try
$ k3d2 cluster create test25 --servers=3 --subnet=auto
←[36mINFO←[0m[0000] Prep: Network
←[36mINFO←[0m[0000] Created network 'k3d-test25' (3ad95731bfce7029df8a7f7540daaa02c2fb305a10e24d531e0ed6e6756d08b2)
←[36mINFO←[0m[0000] Created volume 'k3d-test25-images'
←[36mINFO←[0m[0000] Creating initializing server node
←[36mINFO←[0m[0000] Creating node 'k3d-test25-server-0'
←[36mINFO←[0m[0002] Pulling image 'docker.io/rancher/k3s:v1.20.5-k3s1'
←[36mINFO←[0m[0011] Creating node 'k3d-test25-server-1'
←[36mINFO←[0m[0012] Creating node 'k3d-test25-server-2'
←[36mINFO←[0m[0012] Creating LoadBalancer 'k3d-test25-serverlb'
←[36mINFO←[0m[0014] Pulling image 'docker.io/rancher/k3d-proxy:v4.5.0-dev.0'
←[36mINFO←[0m[0019] Starting cluster 'test25'
←[36mINFO←[0m[0019] Starting the initializing server...
←[36mINFO←[0m[0019] Starting Node 'k3d-test25-server-0'
←[36mINFO←[0m[0020] Starting servers...
←[36mINFO←[0m[0020] Starting Node 'k3d-test25-server-1'
←[31mERRO←[0m[0020] Failed to start node 'k3d-test25-server-1'
←[31mERRO←[0m[0020] Failed Cluster Start: Failed to start server k3d-test25-server-1: Error response from daemon: Address already in use
←[31mERRO←[0m[0020] Failed to create cluster >>> Rolling Back
←[36mINFO←[0m[0020] Deleting cluster 'test25'
←[36mINFO←[0m[0020] Deleted k3d-test25-server-0
←[36mINFO←[0m[0020] Deleted k3d-test25-server-1
←[36mINFO←[0m[0020] Deleted k3d-test25-server-2
←[36mINFO←[0m[0020] Deleted k3d-test25-serverlb
←[36mINFO←[0m[0020] Deleting cluster network 'k3d-test25'
←[36mINFO←[0m[0021] Deleting image volume 'k3d-test25-images'
←[31mFATA←[0m[0021] Cluster creation FAILED, all changes have been rolled back!
$ k3d2 cluster create test25 --servers=3 --subnet=172.45.0.0/16
←[36mINFO←[0m[0000] Prep: Network
←[36mINFO←[0m[0000] Created network 'k3d-test25' (f7900949f4b380af681d7a9f1b39992b971cbbffc94bd57ee254696dc9812fbb)
←[36mINFO←[0m[0000] Created volume 'k3d-test25-images'
←[36mINFO←[0m[0000] Creating initializing server node
←[36mINFO←[0m[0000] Creating node 'k3d-test25-server-0'
←[36mINFO←[0m[0001] Creating node 'k3d-test25-server-1'
←[36mINFO←[0m[0002] Creating node 'k3d-test25-server-2'
←[36mINFO←[0m[0002] Creating LoadBalancer 'k3d-test25-serverlb'
←[36mINFO←[0m[0002] Starting cluster 'test25'
←[36mINFO←[0m[0002] Starting the initializing server...
←[36mINFO←[0m[0002] Starting Node 'k3d-test25-server-0'
←[36mINFO←[0m[0003] Starting servers...
←[36mINFO←[0m[0003] Starting Node 'k3d-test25-server-1'
←[31mERRO←[0m[0003] Failed to start node 'k3d-test25-server-1'
←[31mERRO←[0m[0003] Failed Cluster Start: Failed to start server k3d-test25-server-1: Error response from daemon: Address already in use
←[31mERRO←[0m[0003] Failed to create cluster >>> Rolling Back
←[36mINFO←[0m[0003] Deleting cluster 'test25'
←[36mINFO←[0m[0003] Deleted k3d-test25-server-0
←[36mINFO←[0m[0003] Deleted k3d-test25-server-1
←[36mINFO←[0m[0003] Deleted k3d-test25-server-2
←[36mINFO←[0m[0003] Deleted k3d-test25-serverlb
←[36mINFO←[0m[0003] Deleting cluster network 'k3d-test25'
←[36mINFO←[0m[0003] Deleting image volume 'k3d-test25-images'
←[31mFATA←[0m[0003] Cluster creation FAILED, all changes have been rolled back!
@bukowa ... well.. crap :grin: Forgot to add a case before building. :roll_eyes:
@iwilltry42
After restarting docker and deleting that cluster, my old clusters with 1 server won't bootup :(
E0414 11:40:13.438977 7 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: error trying to reach service: dial tcp 10.43.239.9:443: connect: no route to host
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0414 11:40:13.439010 7 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E0414 11:40:13.440809 7 available_controller.go:508] v1beta1.metrics.k8s.io failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1beta1.metrics.k8s.io": the object has been modified; please apply your changes to the latest version and try again
I0414 11:40:14.147032 7 range_allocator.go:82] Sending events to api server.
I0414 11:40:14.147149 7 range_allocator.go:110] No Service CIDR provided. Skipping filtering out service addresses.
I0414 11:40:14.147175 7 range_allocator.go:116] No Secondary Service CIDR provided. Skipping filtering out secondary service addresses.
I0414 11:40:14.147195 7 controllermanager.go:554] Started "nodeipam"
I0414 11:40:14.147300 7 node_ipam_controller.go:159] Starting ipam controller
I0414 11:40:14.147315 7 shared_informer.go:240] Waiting for caches to sync for node
I0414 11:40:14.149398 7 controllermanager.go:554] Started "persistentvolume-expander"
I0414 11:40:14.149475 7 expand_controller.go:310] Starting expand controller
I0414 11:40:14.149482 7 shared_informer.go:240] Waiting for caches to sync for expand
I0414 11:40:14.151273 7 controllermanager.go:554] Started "clusterrole-aggregation"
I0414 11:40:14.151297 7 clusterroleaggregation_controller.go:149] Starting ClusterRoleAggregator
I0414 11:40:14.151304 7 shared_informer.go:240] Waiting for caches to sync for ClusterRoleAggregator
I0414 11:40:14.152995 7 controllermanager.go:554] Started "job"
I0414 11:40:14.153102 7 job_controller.go:148] Starting job controller
I0414 11:40:14.153113 7 shared_informer.go:240] Waiting for caches to sync for job
I0414 11:40:14.154074 7 shared_informer.go:240] Waiting for caches to sync for resource quota
W0414 11:40:14.156670 7 actual_state_of_world.go:534] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="k3d-k3s-default-server-0" does not exist
I0414 11:40:14.158994 7 shared_informer.go:247] Caches are synced for persistent volume
E0414 11:40:14.160100 7 memcache.go:196] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I0414 11:40:14.161068 7 shared_informer.go:247] Caches are synced for attach detach
I0414 11:40:14.165053 7 shared_informer.go:247] Caches are synced for TTL
I0414 11:40:14.166524 7 shared_informer.go:247] Caches are synced for PVC protection
I0414 11:40:14.169107 7 shared_informer.go:247] Caches are synced for endpoint
I0414 11:40:14.181276 7 shared_informer.go:247] Caches are synced for HPA
I0414 11:40:14.187013 7 shared_informer.go:247] Caches are synced for PV protection
I0414 11:40:14.187210 7 event.go:291] "Event occurred" object="lens-metrics/kube-state-metrics" kind="Endpoints" apiVersion="v1" type="Warning" reason="FailedToUpdateEndpoint" message="Failed to update endpoint lens-metrics/kube-state-metrics: Operation cannot be fulfilled on endpoints \"kube-state-metrics\": the object has been modified; please apply your changes to the latest version and try again"
I0414 11:40:14.195398 7 shared_informer.go:247] Caches are synced for ReplicationController
I0414 11:40:14.198810 7 shared_informer.go:247] Caches are synced for GC
I0414 11:40:14.218147 7 shared_informer.go:247] Caches are synced for namespace
I0414 11:40:14.218332 7 shared_informer.go:247] Caches are synced for endpoint_slice_mirroring
I0414 11:40:14.221166 7 shared_informer.go:247] Caches are synced for disruption
I0414 11:40:14.221254 7 disruption.go:339] Sending events to api server.
I0414 11:40:14.224425 7 shared_informer.go:247] Caches are synced for certificate-csrsigning-kubelet-serving
I0414 11:40:14.224557 7 shared_informer.go:247] Caches are synced for certificate-csrsigning-kubelet-client
I0414 11:40:14.224609 7 shared_informer.go:247] Caches are synced for certificate-csrsigning-legacy-unknown
I0414 11:40:14.224726 7 shared_informer.go:247] Caches are synced for certificate-csrsigning-kube-apiserver-client
I0414 11:40:14.226641 7 shared_informer.go:247] Caches are synced for endpoint_slice
I0414 11:40:14.228199 7 shared_informer.go:247] Caches are synced for service account
I0414 11:40:14.230801 7 shared_informer.go:247] Caches are synced for ReplicaSet
I0414 11:40:14.234906 7 shared_informer.go:247] Caches are synced for certificate-csrapproving
I0414 11:40:14.234977 7 shared_informer.go:247] Caches are synced for taint
I0414 11:40:14.235281 7 node_lifecycle_controller.go:1429] Initializing eviction metric for zone:
W0414 11:40:14.235413 7 node_lifecycle_controller.go:1044] Missing timestamp for Node k3d-k3s-default-server-0. Assuming now as a timestamp.
I0414 11:40:14.238699 7 taint_manager.go:187] Starting NoExecuteTaintManager
I0414 11:40:14.238856 7 event.go:291] "Event occurred" object="k3d-k3s-default-server-0" kind="Node" apiVersion="v1" type="Normal" reason="RegisteredNode" message="Node k3d-k3s-default-server-0 event: Registered Node k3d-k3s-default-server-0 in Controller"
I0414 11:40:14.239237 7 node_lifecycle_controller.go:1245] Controller detected that zone is now in state Normal.
I0414 11:40:14.247401 7 shared_informer.go:247] Caches are synced for node
I0414 11:40:14.247602 7 range_allocator.go:172] Starting range CIDR allocator
I0414 11:40:14.247663 7 shared_informer.go:240] Waiting for caches to sync for cidrallocator
I0414 11:40:14.247703 7 shared_informer.go:247] Caches are synced for cidrallocator
I0414 11:40:14.249530 7 shared_informer.go:247] Caches are synced for deployment
I0414 11:40:14.250710 7 shared_informer.go:247] Caches are synced for expand
I0414 11:40:14.251616 7 shared_informer.go:247] Caches are synced for ClusterRoleAggregator
I0414 11:40:14.272632 7 shared_informer.go:247] Caches are synced for daemon sets
I0414 11:40:14.284690 7 shared_informer.go:247] Caches are synced for stateful set
I0414 11:40:14.392178 7 shared_informer.go:247] Caches are synced for crt configmap
I0414 11:40:14.453240 7 shared_informer.go:247] Caches are synced for job
I0414 11:40:14.454227 7 shared_informer.go:247] Caches are synced for resource quota
I0414 11:40:14.504974 7 shared_informer.go:247] Caches are synced for resource quota
E0414 11:40:14.804091 7 memcache.go:101] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I0414 11:40:14.806192 7 shared_informer.go:240] Waiting for caches to sync for garbage collector
I0414 11:40:14.906439 7 shared_informer.go:247] Caches are synced for garbage collector
I0414 11:40:14.949133 7 shared_informer.go:247] Caches are synced for garbage collector
I0414 11:40:14.949164 7 garbagecollector.go:151] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
E0414 11:40:16.558960 7 available_controller.go:508] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.43.239.9:443/apis/metrics.k8s.io/v1beta1: Get "https://10.43.239.9:443/apis/metrics.k8s.io/v1beta1": dial tcp 10.43.239.9:443: connect: no route to host
W0414 11:40:17.065119 7 lease.go:233] Resetting endpoints for master service "kubernetes" to [172.18.0.3]
time="2021-04-14T11:40:17.067400700Z" level=info msg="Stopped tunnel to 172.18.0.2:6443"
time="2021-04-14T11:40:17.067465300Z" level=info msg="Proxy done" err="context canceled" url="wss://172.18.0.2:6443/v1-k3s/connect"
time="2021-04-14T11:40:17.067664600Z" level=info msg="error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF"
I0414 11:44:52.803254 7 network_policy_controller.go:157] Shutting down network policies full sync goroutine
time="2021-04-14T11:44:52.805001100Z" level=info msg="Shutting down batch/v1, Kind=Job workers"
time="2021-04-14T11:44:52.805117200Z" level=info msg="Shutting down helm.cattle.io/v1, Kind=HelmChartConfig workers"
time="2021-04-14T11:44:52.805641500Z" level=info msg="Shutting down /v1, Kind=Endpoints workers"
time="2021-04-14T11:44:52.807819300Z" level=info msg="Shutting down /v1, Kind=Node workers"
@bukowa can you inspect those node containers?
And can you check if there's an overlapping docker network (docker network ls
& docker network inspect
)?
@iwilltry42 no overlapping networks + all nodes started right now after another docker restart, strange :) before they were in exit loop
@bukowa should be fixed here: https://github.com/rancher/k3d/releases/tag/v4.5.0-dev.1
@iwilltry42 i made a small github action that may be usefull to test this https://github.com/bukowa/k3dtest/runs/2352095712?check_suite_focus=true
Thanks for that @bukowa , I forked it and added kubectl to check that output. It seems to work, right? E.g. https://github.com/iwilltry42/k3dtest/runs/2352325264
@iwilltry42 seems to work but not on my machine ^^
ff192959bbdd rancher/k3d-proxy:v4.5.0-dev.1 "/bin/sh -c nginx-pr…" 25 minutes ago Restarting (1) 39 seconds ago k3d-test55-serverlb
db451387ad46 rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --t…" 25 minutes ago Exited (255) 23 minutes ago k3d-test55-server-2
10ec4e821999 rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --t…" 25 minutes ago Up 21 minutes k3d-test55-server-1
ff7e15fb60bb rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --c…" 25 minutes ago Exited (255) 23 minutes ago k3d-test55-server-0
@iwilltry42 i think ive got something, because i still have old clusters running:
ff192959bbdd rancher/k3d-proxy:v4.5.0-dev.1 "/bin/sh -c nginx-pr…" 33 minutes ago Up 20 seconds 80/tcp, 0.0.0.0:52232->6443/tcp k3d-test55-serverlb
db451387ad46 rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --t…" 33 minutes ago Up 21 seconds k3d-test55-server-2
10ec4e821999 rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --t…" 33 minutes ago Up 20 seconds k3d-test55-server-1
ff7e15fb60bb rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --c…" 33 minutes ago Up 20 seconds k3d-test55-server-0
4e8e8de805e9 rancher/k3s:v1.20.4-k3s1 "/bin/k3s server --t…" 2 weeks ago Exited (255) 23 seconds ago 0.0.0.0:32080->32080/tcp, 0.0.0.0:32443->32443/tcp, 0.0.0.0:65336->6443/tcp k3d-wpenv-server-0
afee5e598565 rancher/k3d-proxy:v4.3.0 "/bin/sh -c nginx-pr…" 4 weeks ago Restarting (1) 2 seconds ago k3d-k3s-default-serverlb
0bdf54e2bf41 rancher/k3s:v1.20.4-k3s1 "/bin/k3s server --t…" 4 weeks ago Exited (255) 23 seconds ago k3d-k3s-default-server-0
now each time i restart docker looks like alive nodes are shifting, take a look:
restart 1:
ff192959bbdd rancher/k3d-proxy:v4.5.0-dev.1 "/bin/sh -c nginx-pr…" 36 minutes ago Up 3 seconds 80/tcp, 0.0.0.0:52232->6443/tcp k3d-test55-serverlb
db451387ad46 rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --t…" 36 minutes ago Exited (255) 47 seconds ago k3d-test55-server-2
10ec4e821999 rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --t…" 36 minutes ago Exited (255) 47 seconds ago k3d-test55-server-1
ff7e15fb60bb rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --c…" 36 minutes ago Exited (255) 47 seconds ago k3d-test55-server-0
4e8e8de805e9 rancher/k3s:v1.20.4-k3s1 "/bin/k3s server --t…" 2 weeks ago Up 45 seconds 0.0.0.0:32080->32080/tcp, 0.0.0.0:32443->32443/tcp, 0.0.0.0:65336->6443/tcp k3d-wpenv-server-0
afee5e598565 rancher/k3d-proxy:v4.3.0 "/bin/sh -c nginx-pr…" 4 weeks ago Up 45 seconds 80/tcp, 0.0.0.0:50670->6443/tcp k3d-k3s-default-serverlb
0bdf54e2bf41 rancher/k3s:v1.20.4-k3s1 "/bin/k3s server --t…" 4 weeks ago Up 45 seconds k3d-k3s-default-server-0
restart 2:
ff192959bbdd rancher/k3d-proxy:v4.5.0-dev.1 "/bin/sh -c nginx-pr…" 37 minutes ago Up 1 second 80/tcp, 0.0.0.0:52232->6443/tcp k3d-test55-serverlb
db451387ad46 rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --t…" 37 minutes ago Up 23 seconds k3d-test55-server-2
10ec4e821999 rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --t…" 37 minutes ago Up 23 seconds k3d-test55-server-1
ff7e15fb60bb rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --c…" 37 minutes ago Exited (255) About a minute ago k3d-test55-server-0
4e8e8de805e9 rancher/k3s:v1.20.4-k3s1 "/bin/k3s server --t…" 2 weeks ago Exited (255) 25 seconds ago 0.0.0.0:32080->32080/tcp, 0.0.0.0:32443->32443/tcp, 0.0.0.0:65336->6443/tcp k3d-wpenv-server-0
afee5e598565 rancher/k3d-proxy:v4.3.0 "/bin/sh -c nginx-pr…" 4 weeks ago Up 2 seconds 80/tcp, 0.0.0.0:50670->6443/tcp k3d-k3s-default-serverlb
0bdf54e2bf41 rancher/k3s:v1.20.4-k3s1 "/bin/k3s server --t…" 4 weeks ago Exited (255) 25 seconds ago k3d-k3s-default-server-0
restart 3:
ff192959bbdd rancher/k3d-proxy:v4.5.0-dev.1 "/bin/sh -c nginx-pr…" 38 minutes ago Up 2 seconds 80/tcp, 0.0.0.0:52232->6443/tcp k3d-test55-serverlb
db451387ad46 rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --t…" 38 minutes ago Exited (255) 26 seconds ago k3d-test55-server-2
10ec4e821999 rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --t…" 38 minutes ago Exited (255) 26 seconds ago k3d-test55-server-1
ff7e15fb60bb rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --c…" 38 minutes ago Exited (255) 2 minutes ago k3d-test55-server-0
4e8e8de805e9 rancher/k3s:v1.20.4-k3s1 "/bin/k3s server --t…" 2 weeks ago Up 25 seconds 0.0.0.0:32080->32080/tcp, 0.0.0.0:32443->32443/tcp, 0.0.0.0:65336->6443/tcp k3d-wpenv-server-0
afee5e598565 rancher/k3d-proxy:v4.3.0 "/bin/sh -c nginx-pr…" 4 weeks ago Up 24 seconds 80/tcp, 0.0.0.0:50670->6443/tcp k3d-k3s-default-serverlb
0bdf54e2bf41 rancher/k3s:v1.20.4-k3s1 "/bin/k3s server --t…" 4 weeks ago Up 23 seconds k3d-k3s-default-server-0
restart 4:
ff192959bbdd rancher/k3d-proxy:v4.5.0-dev.1 "/bin/sh -c nginx-pr…" 40 minutes ago Restarting (1) 7 seconds ago k3d-test55-serverlb
db451387ad46 rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --t…" 40 minutes ago Exited (255) 3 minutes ago k3d-test55-server-2
10ec4e821999 rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --t…" 40 minutes ago Up About a minute k3d-test55-server-1
ff7e15fb60bb rancher/k3s:v1.20.5-k3s1 "/bin/k3s server --c…" 40 minutes ago Up About a minute k3d-test55-server-0
4e8e8de805e9 rancher/k3s:v1.20.4-k3s1 "/bin/k3s server --t…" 2 weeks ago Exited (255) About a minute ago 0.0.0.0:32080->32080/tcp, 0.0.0.0:32443->32443/tcp, 0.0.0.0:65336->6443/tcp k3d-wpenv-server-0
afee5e598565 rancher/k3d-proxy:v4.3.0 "/bin/sh -c nginx-pr…" 4 weeks ago Restarting (1) 6 seconds ago k3d-k3s-default-serverlb
0bdf54e2bf41 rancher/k3s:v1.20.4-k3s1 "/bin/k3s server --t…" 4 weeks ago Exited (255) About a minute ago k3d-k3s-default-server-0
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
b436e5a0149c bridge bridge local
e05bb6c1420e host host local
56bb9cb725d6 k3d-k3s-default bridge local
ed22e183cae8 k3d-test55 bridge local
5312bd5f60bd k3d-wpenv bridge local
2208d8a45078 none null local
$ docker network inspect k3d-k3s-default
[
{
"Name": "k3d-k3s-default",
"Id": "56bb9cb725d6eeb944f529cd964814845d5e5142029f83e06bfc85d584beb4aa",
"Created": "2021-03-12T00:27:05.3874863Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.18.0.0/16",
"Gateway": "172.18.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"afee5e59856529cb6e304a423b93a79ff59b7439e4c4c8a5fc7b5be0a3f4c5b6": {
"Name": "k3d-k3s-default-serverlb",
"EndpointID": "7a9659b83a4c9d76b291a4461abeea0e8411af01cef9f2c85efae474aa221a38",
"MacAddress": "02:42:ac:12:00:02",
"IPv4Address": "172.18.0.2/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {
"app": "k3d"
}
}
]
$ docker network inspect k3d-test55
[
{
"Name": "k3d-test55",
"Id": "ed22e183cae8b7f29442f03cf95b81dd3ba12523e355aa8b9acfb9af84b281f6",
"Created": "2021-04-15T11:45:01.0438683Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.20.0.0/16",
"Gateway": "172.20.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"10ec4e821999abf5a2c7a501cecafd8f13daea4842a9873f96985d36a8d7ef3f": {
"Name": "k3d-test55-server-1",
"EndpointID": "a130a1d83af61109579915e2b32fe898e1417e8ad65cd1eb563a578eda2d82ad",
"MacAddress": "02:42:ac:14:00:03",
"IPv4Address": "172.20.0.3/16",
"IPv6Address": ""
},
"ff192959bbdd6fa5bd35e7fb3ba850b2156760c3cd6765159d311141cda3b885": {
"Name": "k3d-test55-serverlb",
"EndpointID": "a05b53563a77e22ba2a79f0560c01c260833cd4cdf9f1f59b270434a3814f42a",
"MacAddress": "02:42:ac:14:00:04",
"IPv4Address": "172.20.0.4/16",
"IPv6Address": ""
},
"ff7e15fb60bb195de04aac993dc7456abcc3ea2adacd78a7bb4ee62ce278e098": {
"Name": "k3d-test55-server-0",
"EndpointID": "f84d9ba79007959197f4b5af7f1873f7a1e85da5f5aa23840963b9e45aea2f3b",
"MacAddress": "02:42:ac:14:00:02",
"IPv4Address": "172.20.0.2/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {
"app": "k3d"
}
}
]
$ docker network inspect k3d-wpenv
[
{
"Name": "k3d-wpenv",
"Id": "5312bd5f60bd523d046f919bf42c2b25c75463cc851aeb64e3bfff9b52583a11",
"Created": "2021-03-28T22:57:14.5128859Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.19.0.0/16",
"Gateway": "172.19.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {},
"Options": {},
"Labels": {
"app": "k3d"
}
}
]
@bukowa , I am like absolutely lost there :thinking: The network settings look fine to me and I cannot think of any way, that the clusters could interfere.. :thinking:
@iwilltry42 ok maybe this finding can become useful when someone encounters similar issue
I tried to do k3d cluster stop dev ; k3d cluster start dev
after I received the error from issue #262
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes)
Now it hangs here:
INFO[0000] Stopping cluster 'dev'
INFO[0000] Starting cluster 'dev'
INFO[0000] Starting the initializing server...
INFO[0000] Starting Node 'k3d-dev-server-0'
INFO[0001] Starting servers...
INFO[0001] Starting Node 'k3d-dev-server-1'
The cluster was initially created with:
k3d registry create registry.localhost --port 5000
k3d cluster create dev \
--k3s-server-arg "--no-deploy=traefik" \
--registry-use k3d-registry.localhost:5000 \
--port 80:80@loadbalancer \
--port 443:443@loadbalancer \
--api-port 6443 --servers 3 --agents 3
Hi @renepardon , can you please share the logs of the k3d-dev-server-0
and k3d-dev-server-1
containers?
Also, which version of k3d and k3s are you using and what is your environment?
Hello Thorsten,
I work on Ubuntu 20.04 and have the latest version of k3d. I had to delete and recreate the cluster to be able to continue with my workload. Next time this issue appears I will post the logs.
Von unterwegs gesendet
Am 27.05.2021 um 12:36 schrieb Thorsten Klein @.***>:
Hi @renepardonhttps://github.com/renepardon , can you please share the logs of the k3d-dev-server-0 and k3d-dev-server-1 containers? Also, which version of k3d and k3s are you using and what is your environment?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/rancher/k3d/issues/550#issuecomment-849525621, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAHGZK2TMXYEE7J6CRNQELTTPYOBDANCNFSM42PYHG6A.
@iwilltry42
k3d version v4.4.3 k3s version v1.20.6-k3s1 (default)
But I can't find the log files. Neither in my home directory, nor in /var/log
@renepardon , if you were to search for actual log files, they'd be in /var/lib/docker/containers/<ID>/<ID>.log
. But you can just do e.g. docker logs k3d-dev-server-0
to get them :+1:
Also, please try k3d v4.4.4 and paste your kernel version (uname -r
).
@iwilltry42
k3d-dev-server-0 log repeats infinite like this:
time="2021-06-07T06:42:26.583579237Z" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [k3d-dev-server-1-b8022d18=https://172.21.0.4:2380 k3d-dev-server-2-bb40fdf0=https://172.21.0.5:2380 k3d-dev-server-0-f3a8f84f=https://172.21.0.3:2380], expect: k3d-dev-server-0-f3a8f84f=172.21.0.7"
time="2021-06-07T06:42:26.881852734Z" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1622098330: notBefore=2021-05-27 06:52:10 +0000 UTC notAfter=2022-06-07 06:42:26 +0000 UTC"
time="2021-06-07T06:42:26.882548039Z" level=info msg="Cluster-Http-Server 2021/06/07 06:42:26 http: TLS handshake error from 172.21.0.6:48878: remote error: tls: bad certificate"
time="2021-06-07T06:42:26.883032016Z" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1622098330: notBefore=2021-05-27 06:52:10 +0000 UTC notAfter=2022-06-07 06:42:26 +0000 UTC"
time="2021-06-07T06:42:27.350483832Z" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1622098330: notBefore=2021-05-27 06:52:10 +0000 UTC notAfter=2022-06-07 06:42:27 +0000 UTC"
time="2021-06-07T06:42:27.351461555Z" level=info msg="Cluster-Http-Server 2021/06/07 06:42:27 http: TLS handshake error from 172.21.0.2:35352: remote error: tls: bad certificate"
and k3d-dev-server-1:
{"level":"info","ts":"2021-06-07T06:43:10.559Z","caller":"mvcc/index.go:189","msg":"compact tree index","revision":427102}
{"level":"info","ts":"2021-06-07T06:43:10.572Z","caller":"mvcc/kvstore_compaction.go:56","msg":"finished scheduled compaction","compact-revision":427102,"took":"12.938268ms"}
W0607 06:43:13.421471 8 handler_proxy.go:102] no RequestInfo found in the context
E0607 06:43:13.421512 8 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0607 06:43:13.421518 8 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
{"level":"warn","ts":"2021-06-07T06:43:13.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:13.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:18.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:18.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:23.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:23.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:28.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:28.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:33.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:33.504Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:38.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:38.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
W0607 06:43:43.357254 8 info.go:53] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
{"level":"warn","ts":"2021-06-07T06:43:43.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:43.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:48.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:48.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:53.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:53.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:58.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:43:58.506Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:44:03.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:44:03.506Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:44:08.505Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
{"level":"warn","ts":"2021-06-07T06:44:08.506Z","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"be99668240cb5dc7","rtt":"0s","error":"dial tcp 172.21.0.3:2380: connect: connection refused"}
@renepardon , I'll follow up on the similar issue in https://github.com/rancher/k3d/issues/619 :+1:
@iwilltry42 I can still reproduce that IPs are occasionally shuffled on restart on k3d 5.4.9 - k3s v1.24.12+k3s1 (especially after a host reboot).
Does it help if I open a new issue?
@iwilltry42 I can still reproduce that IPs are occasionally shuffled on restart on k3d 5.4.9 - k3s v1.24.12+k3s1 (especially after a host reboot).
Does it help if I open a new issue?
I'm experiencing the same issue. After rebooting the host, the cluster becomes irrecoverably lost. k3d v5.6.3 - k3s v1.28.8-k3s1.
Feature Request
Original Bug Report
What did you do
I was toying with system resiliency around restarting servers and agents, and found that the Init server didn't come back after the following sequence:
where we wind up with this message repeated:
As it turns out the IP for k3d-test-k3d-server-0 moves from 172.20.0.2 to 172.20.0.6 (presumably swapping with k3d-test-k3d-agent-1). Is there any way to lock this down a bit so docker doesn't accidentally flip things around?
What did you expect to happen
I expected the IPs to be maintained...
Which OS & Architecture
Which version of
k3d
k3d version
Which version of docker
docker version
anddocker info
Server: Engine: Version: 19.03.15 API version: 1.40 (minimum version 1.12) Go version: go1.15.7 Git commit: 420b1d3625 Built: Thu Feb 11 18:14:30 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.3.9 GitCommit: ea765ab runc: Version: 1.0.0-rc10 GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd docker-init: Version: 0.18.0 GitCommit: fec3683b971d9c3ef73f284f176672c44b448662
% docker info Client: Debug Mode: false
Server: Containers: 8 Running: 7 Paused: 0 Stopped: 1 Images: 3 Server Version: 19.03.15 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: ea765ab runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd init version: fec3683b971d9c3ef73f284f176672c44b448662 Security Options: seccomp Profile: default Kernel Version: 5.4.97-gentoo Operating System: Gentoo/Linux OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 15.62GiB Name: crossbow ID: BXTV:WYUT:UXM3:IW4V:4AUA:JU6Q:ZYSE:APB4:JWOZ:UTUS:ZKRK:SEDG Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false