kube-dns issue during 'spread cluster start'

huangsam commented 8 years ago

I first run the following commands in my Vagrant VM:

$ curl -sL https://github.com/redspread/spread/releases/download/0.1.3/spread_linux_amd64 > ./spread
$ curl -s https://storage.googleapis.com/kubernetes-release/release/v1.1.8/bin/linux/amd64/kubectl > ./kubectl
$ chmod +x ./kubectl ./spread
$ sudo mv ./kubectl /usr/bin
$ sudo mv ./spread /usr/bin
$ spread cluster start

I run docker logs localkube and I notice that I encounter the kube-dns issue endlessly:

E0322 22:56:14.523720   11988 server.go:75] unable to register configz: register config "componentconfig" twice
I0322 22:56:14.508477   11988 replication_controller.go:208] Starting RC Manager
E0322 22:56:14.553651   11988 server.go:285] unable to register configz: register config "componentconfig" twice
I0322 22:56:14.600077   11988 server.go:216] Using userspace Proxier.
I0322 22:56:14.618974   11988 server.go:236] Tearing down pure-iptables proxy rules.
Starting dns...
2016-03-22 22:56:14.631464 I | etcdserver: name = dns
2016-03-22 22:56:14.631553 I | etcdserver: data dir = /var/dns/data
2016-03-22 22:56:14.631616 I | etcdserver: member dir = /var/dns/data/member
2016-03-22 22:56:14.631660 I | etcdserver: heartbeat = 100ms
2016-03-22 22:56:14.631708 I | etcdserver: election = 1000ms
2016-03-22 22:56:14.631742 I | etcdserver: snapshot count = 10000
2016-03-22 22:56:14.631781 I | etcdserver: advertise client URLs = http://localhost:9090
2016-03-22 22:56:14.631820 I | etcdserver: initial advertise peer URLs = http://localhost:9256
2016-03-22 22:56:14.631872 I | etcdserver: initial cluster = dns=http://localhost:9256
2016-03-22 22:56:14.636107 I | etcdserver: starting member 6429d4ab4d8a33da in cluster d3c354cfe77b0b70
2016-03-22 22:56:14.636260 I | raft: 6429d4ab4d8a33da became follower at term 0
2016-03-22 22:56:14.636342 I | raft: newRaft 6429d4ab4d8a33da [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2016-03-22 22:56:14.636385 I | raft: 6429d4ab4d8a33da became follower at term 1
I0322 22:56:14.638248   11988 conntrack.go:36] Setting nf_conntrack_max to 262144
I0322 22:56:14.638384   11988 conntrack.go:41] Setting conntrack hashsize to 65536
I0322 22:56:14.638705   11988 conntrack.go:46] Setting nf_conntrack_tcp_timeout_established to 86400
[restful] 2016/03/22 22:56:14 log.go:30: [restful/swagger] listing is available at https://10.0.2.15:443/swaggerapi/
[restful] 2016/03/22 22:56:14 log.go:30: [restful/swagger] https://10.0.2.15:443/swaggerui/ is mapped to folder /swagger-ui/
E0322 22:56:14.659215   11988 nodecontroller.go:229] Error monitoring node status: Get http://localhost:8080/api/v1/nodes: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E0322 22:56:14.660562   11988 controllermanager.go:259] Failed to get api versions from server: Get http://localhost:8080/api: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E0322 22:56:14.661838   11988 event.go:202] Unable to write event: 'Post http://0.0.0.0:8080/api/v1/namespaces/default/events: dial tcp 0.0.0.0:8080: getsockopt: connection refused' (may retry after sleeping)
2016-03-22 22:56:14.687741 I | etcdserver: starting server... [version: 2.2.5, cluster version: to_be_decided]
2016-03-22 22:56:14.724130 I | skydns: ready for queries on cluster.local. for tcp://172.17.0.1:1970 [rcache 0]
2016-03-22 22:56:14.724270 I | skydns: ready for queries on cluster.local. for udp://172.17.0.1:1970 [rcache 0]
2016-03-22 22:56:14.740503 N | etcdserver: added local member 6429d4ab4d8a33da [http://localhost:9256] to cluster d3c354cfe77b0b70
Failed to create service+endpoint for DNS: Post http://0.0.0.0:8080/api/v1/namespaces/kube-system/services: dial tcp 0.0.0.0:8080: getsockopt: connection refused
I0322 22:56:14.823737   11988 kube2sky.go:477] Etcd server found: http://localhost:9090
I0322 22:56:15.009874   11988 genericapiserver.go:692] Serving securely on 0.0.0.0:6443
2016-03-22 22:56:15.393400 I | raft: 6429d4ab4d8a33da is starting a new election at term 1
2016-03-22 22:56:15.393400 I | raft: 6429d4ab4d8a33da became candidate at term 2
2016-03-22 22:56:15.393400 I | raft: 6429d4ab4d8a33da received vote from 6429d4ab4d8a33da at term 2
2016-03-22 22:56:15.393400 I | raft: 6429d4ab4d8a33da became leader at term 2
2016-03-22 22:56:15.393445 I | raft: raft.node: 6429d4ab4d8a33da elected leader 6429d4ab4d8a33da at term 2
2016-03-22 22:56:15.395424 I | etcdserver: published {Name:dns ClientURLs:[http://localhost:9090]} to cluster d3c354cfe77b0b70
2016-03-22 22:56:15.395519 I | etcdserver: setting up the initial cluster version to 2.2
2016-03-22 22:56:15.397670 N | etcdserver: set the initial cluster version to 2.2
I0322 22:56:15.875031   11988 docker.go:282] Connecting to docker on unix:///var/run/weave/weave.sock
W0322 22:56:15.875812   11988 server.go:445] Could not load kubeconfig file /var/lib/kubelet/kubeconfig: stat /var/lib/kubelet/kubeconfig: no such file or directory. Trying auth path instead.
W0322 22:56:15.875929   11988 server.go:406] Could not load kubernetes auth path /var/lib/kubelet/kubernetes_auth: stat /var/lib/kubelet/kubernetes_auth: no such file or directory. Continuing with defaults.
I0322 22:56:15.876314   11988 plugins.go:71] No cloud provider specified.
...
...
...
I0322 22:56:16.676078   11988 factory.go:233] Registering Docker factory
I0322 22:56:16.680932   11988 factory.go:97] Registering Raw factory
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
I0322 22:56:16.924542   11988 manager.go:1003] Started watching for new ooms in manager
W0322 22:56:16.924667   11988 manager.go:243] Could not configure a source for OOM detection, disabling OOM events: exec: "journalctl": executable file not found in $PATH
I0322 22:56:16.925824   11988 manager.go:256] Starting recovery of all containers
I0322 22:56:16.966044   11988 manager.go:261] Recovery completed
I0322 22:56:17.046067   11988 kube2sky.go:658] Successfully added DNS record for Kubernetes service.
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
I0322 22:56:26.615931   11988 kubelet.go:1150] Node testing-dev was previously registered
W0322 22:56:26.679895   11988 nodecontroller.go:670] Missing timestamp for Node testing-dev. Assuming now as a timestamp.
I0322 22:56:26.684011   11988 event.go:211] Event(api.ObjectReference{Kind:"Node", Namespace:"", Name:"testing-dev", UID:"testing-dev", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node testing-dev event: Registered Node testing-dev in NodeController
I0322 22:56:26.694454   11988 controllermanager.go:286] Starting extensions/v1beta1 apis
I0322 22:56:26.694548   11988 controllermanager.go:288] Starting horizontal pod controller.
I0322 22:56:26.694700   11988 controllermanager.go:302] Starting daemon set controller
I0322 22:56:26.694832   11988 controllermanager.go:308] Starting job controller
I0322 22:56:26.695511   11988 controllermanager.go:314] Starting deployment controller
I0322 22:56:26.695673   11988 controllermanager.go:320] Starting ReplicaSet controller
I0322 22:56:26.695898   11988 horizontal.go:120] Starting HPA Controller
I0322 22:56:26.696127   11988 controller.go:209] Starting Daemon Sets controller manager
W0322 22:56:26.697265   11988 request.go:344] Field selector: v1 - serviceaccounts - metadata.name - default: need to check if this is versioned correctly.
W0322 22:56:26.703870   11988 request.go:344] Field selector: v1 - serviceaccounts - metadata.name - default: need to check if this is versioned correctly.
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
E0322 22:56:36.697685   11988 reflector.go:271] /usr/lib/go/src/runtime/asm_amd64.s:1998: Failed to watch *api.Node: too old resource version: 51 (57)
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
Failed to create service+endpoint for DNS: Service "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.1.30.3": provided IP is already allocated
...
...
...

Creating pods and accessing kubectl cluster-info works fine. Curling on port 8080 returns the api routes as well. However, my concern is that the interactions between services/pods will not work properly. Not sure if there's anything I can configure to resolve the kube-dns issues that I am encountering.

I also found it interesting that the downloaded spread version was 0.0.3 and not 0.1.3.

$ spread -v
spread version 0.0.3

johnmark commented 8 years ago

I got the same thing - which I found out because mattermost was failing, and I was trying to debug.

mfburnett commented 8 years ago

@huangsam @johnmark thanks for pointing this out - we'll let you know as soon as we start working on this bug. Should be in a few days.

ethernetdan commented 8 years ago

My theory for the provided IP is already allocated error is that at some point the DNS Service was successfully created leading to the IP conflict. The logic in dns.go attempts to create the Service in a loop regardless if it exists or not.

If this is the case then DNS+Services should work correctly. I'm looking into recreating this and attempting a fix.

In the meantime a way to check if this is caused by a Service already existing is to delete or rename the localkube data directory (by default it's ~/.localkube). This will cause a fresh cluster to be started.

ethernetdan commented 8 years ago

I was able to recreate the problem and create a patch that worked for me.

@huangsam & @johnmark: Can you docker pull redspreadapps/localkube:latest and stop and start the cluster to confirm that this fixes it for you?

johnmark commented 8 years ago

It works now - I still couldn't get mattermost working with localkube, but once I got past this error, I found the real issue: I hadn't set up SELinux properly :)

Thanks.

redspread / localkube

kube-dns issue during 'spread cluster start' #33