vmware-archive / wardroom

A tool for creating Kubernetes-ready base operating system images.
Apache License 2.0
162 stars 44 forks source link

Avoid race condition for API masters when dns health check used #144

Closed erictcgs closed 5 years ago

erictcgs commented 5 years ago

Running into the following issue:

There's an option to "manage /etc/hosts" which adds a common api ip and fqdn to /etc/hosts files - not uncommon however (and in our situation) to have only a common fqdn, ips resolved by dns. Behavior should allow for masters to link 127.0.0.1 to common fqdn, and nodes to keep current behavior and only add hosts entry if both common api ip and common api fqdn are present

@craigtracey

erictcgs commented 5 years ago

Following works - need to disable the setting of default common api ip/fqdn

I added the following for me for now in the kuberentes-common role:

diff --git a/ansible/roles/kubernetes-common/defaults/main.yml b/ansible/roles/kubernetes-common/defaults/main.yml
index 152fc73..d2acf33 100644
--- a/ansible/roles/kubernetes-common/defaults/main.yml
+++ b/ansible/roles/kubernetes-common/defaults/main.yml
@@ -1,8 +1,8 @@
---
kubernetes_common_disable_swap: True
kubernetes_common_manage_etc_hosts: True
-kubernetes_common_api_fqdn: k8s.example.com
-kubernetes_common_api_ip: 10.10.10.3
+#kubernetes_common_api_fqdn: k8s.example.com
+#kubernetes_common_api_ip: 10.10.10.3
kubernetes_common_primary_interface: eth0

 # kubelet_extra_args is a dict of arg:value (ie. 'node-ip: 1.1.1.1' for '--node-ip=1.1.1.1')
diff --git a/ansible/roles/kubernetes-common/tasks/main.yml b/ansible/roles/kubernetes-common/tasks/main.yml
index 6873ad2..4728bad 100644
--- a/ansible/roles/kubernetes-common/tasks/main.yml
+++ b/ansible/roles/kubernetes-common/tasks/main.yml
@@ -11,12 +11,19 @@
   command: swapoff -a
   when: kubernetes_common_disable_swap|bool == True

-- name: update /etc/hosts to include cluster fqdn
+- name: update /etc/hosts to include cluster fqdn for nodes
   lineinfile:
     dest: /etc/hosts
     line: "{{ kubernetes_common_api_ip }} {{ kubernetes_common_api_fqdn }}"
     state: present
-  when: kubernetes_common_manage_etc_hosts and kubernetes_common_api_ip is defined and kubernetes_common_api_fqdn is defined
+  when: kubernetes_common_manage_etc_hosts and kubernetes_common_api_ip is defined and kubernetes_common_api_fqdn is defined and 'nodes' in group_names
+
+- name: update /etc/hosts to include cluster fqdn for masters
+  lineinfile:
+    dest: /etc/hosts
+    line: "127.0.1.1 {{ kubernetes_common_api_fqdn }}"
+    state: present
+  when: kubernetes_common_manage_etc_hosts and kubernetes_common_api_fqdn is defined and 'masters' in group_names

 - set_fact:
     kubernetes_node_ip: "{{ hostvars[inventory_hostname]['ansible_'~item]['ipv4']['address'] }}"
craigtracey commented 5 years ago

As round robin fronting a cluster is a bit of a corner case, and as carrying this code might introduce inadvertent failures, we have recommended that kubernetes_common_manage_etc_hosts be set to false, and that /etc/hosts is managed external to wardroom. Closing for now, but please reopen if there are any issues with this approach.