kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
15.88k stars 6.41k forks source link

Vagrant fails when etcd_deployment_type=host #6255

Closed pasqualet closed 3 years ago

pasqualet commented 4 years ago

Bug description:

Vagrant fails installing etcd when etcd_deployment_type=host.

Environment:

Kubespray version (commit) (git rev-parse --short HEAD): a7b8708d

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

Default inventory with the following changes:

etcd_deployment_type: host

Output of ansible run:

TASK [etcd : Configure | Ensure etcd is running] *******************************
fatal: [k8s-3]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because a timeout was exceeded.\nSee \"systemctl status etcd.service\" and \"journalctl -xe\" for details.\n"}
fatal: [k8s-2]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because a timeout was exceeded.\nSee \"systemctl status etcd.service\" and \"journalctl -xe\" for details.\n"}
fatal: [k8s-1]: FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because a timeout was exceeded.\nSee \"systemctl status etcd.service\" and \"journalctl -xe\" for details.\n"}

Anything else do we need to know:

Logs from the first instance (k8s-1):

$ sudo systemctl status etcd.service
● etcd.service - etcd
   Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: enabled)
   Active: activating (start) since Wed 2020-06-10 15:50:12 UTC; 23s ago
 Main PID: 17427 (etcd)
    Tasks: 5 (limit: 2317)
   CGroup: /system.slice/etcd.service
           └─17427 /usr/local/bin/etcd

Jun 10 15:50:35 k8s-1 etcd[17427]: rejected connection from "172.18.8.1:46118" (error "tls: \"172.18.8.1\" does not match any of DNSNames [\"localhost\" \"k8s-1\" \"k8s-2\" \"k8s-3\" \"lb-apiserver.kubernetes.local\" \"etcd.kube-system.svc.cluster.local\" \"etcd.kube-system.svc\" \"etcd.kube-system\" \"etcd\"] (lookup etcd on 127.0.0.53:53: server misbehaving)", ServerName "", IPAddresses ["172.18.8.101" "172.18.8.102" "172.18.8.103" "127.0.0.1"], DNSNames ["localhost" "k8s-1" "k8s-2" "k8s-3" "lb-apiserver.kubernetes.local" "etcd.kube-system.svc.cluster.local" "etcd.kube-system.svc" "etcd.kube-system" "etcd"])
Jun 10 15:50:35 k8s-1 etcd[17427]: rejected connection from "172.18.8.1:44544" (error "tls: \"172.18.8.1\" does not match any of DNSNames [\"localhost\" \"k8s-1\" \"k8s-2\" \"k8s-3\" \"lb-apiserver.kubernetes.local\" \"etcd.kube-system.svc.cluster.local\" \"etcd.kube-system.svc\" \"etcd.kube-system\" \"etcd\"] (lookup etcd on 127.0.0.53:53: server misbehaving)", ServerName "", IPAddresses ["172.18.8.101" "172.18.8.102" "172.18.8.103" "127.0.0.1"], DNSNames ["localhost" "k8s-1" "k8s-2" "k8s-3" "lb-apiserver.kubernetes.local" "etcd.kube-system.svc.cluster.local" "etcd.kube-system.svc" "etcd.kube-system" "etcd"])
Jun 10 15:50:35 k8s-1 etcd[17427]: rejected connection from "172.18.8.1:46124" (error "tls: \"172.18.8.1\" does not match any of DNSNames [\"localhost\" \"k8s-1\" \"k8s-2\" \"k8s-3\" \"lb-apiserver.kubernetes.local\" \"etcd.kube-system.svc.cluster.local\" \"etcd.kube-system.svc\" \"etcd.kube-system\" \"etcd\"] (lookup etcd on 127.0.0.53:53: server misbehaving)", ServerName "", IPAddresses ["172.18.8.101" "172.18.8.102" "172.18.8.103" "127.0.0.1"], DNSNames ["localhost" "k8s-1" "k8s-2" "k8s-3" "lb-apiserver.kubernetes.local" "etcd.kube-system.svc.cluster.local" "etcd.kube-system.svc" "etcd.kube-system" "etcd"])
Jun 10 15:50:35 k8s-1 etcd[17427]: rejected connection from "172.18.8.1:44546" (error "tls: \"172.18.8.1\" does not match any of DNSNames [\"localhost\" \"k8s-1\" \"k8s-2\" \"k8s-3\" \"lb-apiserver.kubernetes.local\" \"etcd.kube-system.svc.cluster.local\" \"etcd.kube-system.svc\" \"etcd.kube-system\" \"etcd\"] (lookup etcd on 127.0.0.53:53: server misbehaving)", ServerName "", IPAddresses ["172.18.8.101" "172.18.8.102" "172.18.8.103" "127.0.0.1"], DNSNames ["localhost" "k8s-1" "k8s-2" "k8s-3" "lb-apiserver.kubernetes.local" "etcd.kube-system.svc.cluster.local" "etcd.kube-system.svc" "etcd.kube-system" "etcd"])
Jun 10 15:50:35 k8s-1 etcd[17427]: rejected connection from "172.18.8.1:44552" (error "tls: \"172.18.8.1\" does not match any of DNSNames [\"localhost\" \"k8s-1\" \"k8s-2\" \"k8s-3\" \"lb-apiserver.kubernetes.local\" \"etcd.kube-system.svc.cluster.local\" \"etcd.kube-system.svc\" \"etcd.kube-system\" \"etcd\"] (lookup etcd on 127.0.0.53:53: server misbehaving)", ServerName "", IPAddresses ["172.18.8.101" "172.18.8.102" "172.18.8.103" "127.0.0.1"], DNSNames ["localhost" "k8s-1" "k8s-2" "k8s-3" "lb-apiserver.kubernetes.local" "etcd.kube-system.svc.cluster.local" "etcd.kube-system.svc" "etcd.kube-system" "etcd"])
Jun 10 15:50:35 k8s-1 etcd[17427]: rejected connection from "172.18.8.1:46128" (error "tls: \"172.18.8.1\" does not match any of DNSNames [\"localhost\" \"k8s-1\" \"k8s-2\" \"k8s-3\" \"lb-apiserver.kubernetes.local\" \"etcd.kube-system.svc.cluster.local\" \"etcd.kube-system.svc\" \"etcd.kube-system\" \"etcd\"] (lookup etcd on 127.0.0.53:53: server misbehaving)", ServerName "", IPAddresses ["172.18.8.101" "172.18.8.102" "172.18.8.103" "127.0.0.1"], DNSNames ["localhost" "k8s-1" "k8s-2" "k8s-3" "lb-apiserver.kubernetes.local" "etcd.kube-system.svc.cluster.local" "etcd.kube-system.svc" "etcd.kube-system" "etcd"])
Jun 10 15:50:35 k8s-1 etcd[17427]: rejected connection from "172.18.8.1:46132" (error "tls: \"172.18.8.1\" does not match any of DNSNames [\"localhost\" \"k8s-1\" \"k8s-2\" \"k8s-3\" \"lb-apiserver.kubernetes.local\" \"etcd.kube-system.svc.cluster.local\" \"etcd.kube-system.svc\" \"etcd.kube-system\" \"etcd\"] (lookup etcd on 127.0.0.53:53: server misbehaving)", ServerName "", IPAddresses ["172.18.8.101" "172.18.8.102" "172.18.8.103" "127.0.0.1"], DNSNames ["localhost" "k8s-1" "k8s-2" "k8s-3" "lb-apiserver.kubernetes.local" "etcd.kube-system.svc.cluster.local" "etcd.kube-system.svc" "etcd.kube-system" "etcd"])
Jun 10 15:50:35 k8s-1 etcd[17427]: rejected connection from "172.18.8.1:44554" (error "tls: \"172.18.8.1\" does not match any of DNSNames [\"localhost\" \"k8s-1\" \"k8s-2\" \"k8s-3\" \"lb-apiserver.kubernetes.local\" \"etcd.kube-system.svc.cluster.local\" \"etcd.kube-system.svc\" \"etcd.kube-system\" \"etcd\"] (lookup etcd on 127.0.0.53:53: server misbehaving)", ServerName "", IPAddresses ["172.18.8.101" "172.18.8.102" "172.18.8.103" "127.0.0.1"], DNSNames ["localhost" "k8s-1" "k8s-2" "k8s-3" "lb-apiserver.kubernetes.local" "etcd.kube-system.svc.cluster.local" "etcd.kube-system.svc" "etcd.kube-system" "etcd"])
Jun 10 15:50:35 k8s-1 etcd[17427]: rejected connection from "172.18.8.1:44560" (error "tls: \"172.18.8.1\" does not match any of DNSNames [\"localhost\" \"k8s-1\" \"k8s-2\" \"k8s-3\" \"lb-apiserver.kubernetes.local\" \"etcd.kube-system.svc.cluster.local\" \"etcd.kube-system.svc\" \"etcd.kube-system\" \"etcd\"] (lookup etcd on 127.0.0.53:53: server misbehaving)", ServerName "", IPAddresses ["172.18.8.101" "172.18.8.102" "172.18.8.103" "127.0.0.1"], DNSNames ["localhost" "k8s-1" "k8s-2" "k8s-3" "lb-apiserver.kubernetes.local" "etcd.kube-system.svc.cluster.local" "etcd.kube-system.svc" "etcd.kube-system" "etcd"])
Jun 10 15:50:35 k8s-1 etcd[17427]: rejected connection from "172.18.8.1:46138" (error "tls: \"172.18.8.1\" does not match any of DNSNames [\"localhost\" \"k8s-1\" \"k8s-2\" \"k8s-3\" \"lb-apiserver.kubernetes.local\" \"etcd.kube-system.svc.cluster.local\" \"etcd.kube-system.svc\" \"etcd.kube-system\" \"etcd\"] (lookup etcd on 127.0.0.53:53: server misbehaving)", ServerName "", IPAddresses ["172.18.8.101" "172.18.8.102" "172.18.8.103" "127.0.0.1"], DNSNames ["localhost" "k8s-1" "k8s-2" "k8s-3" "lb-apiserver.kubernetes.local" "etcd.kube-system.svc.cluster.local" "etcd.kube-system.svc" "etcd.kube-system" "etcd"])

Workaround:

I can make it works changing the Vagrant subnet:

$subnet = "10.0.20"
fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 3 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/6255#issuecomment-723474611): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
thanosz commented 3 years ago

Just to give a hint for anyone experiencing this as it also occurred to me. This seems to happen when you have docker installed and the 172.18.8.0/24 network is already occupied by a docker network bridge.