Cannot use API server certificate for both external kubectl and internal kubelet

itilitymaarten commented 6 years ago

I am using kubetool to generate my hieradata for me, including of coures all certificates:

docker run --rm -v "$(pwd)":/mnt \
  -e OS=debian \
  -e VERSION=1.9.2 \
  -e CONTAINER_RUNTIME=docker \
  -e CNI_PROVIDER=weave \
  -e FQDN=<LOAD BALANCER HOSTNAME>.westeurope.cloudapp.azure.com \
  -e IP=<PUBLIC IP> \
  -e BOOTSTRAP_CONTROLLER_IP=10.2.3.7 \
  -e ETCD_INITIAL_CLUSTER="etcd-kubernetes-01=http://10.2.3.7:2380" \
  -e ETCD_IP="%{::ipaddress}" \
  -e KUBE_API_ADVERTISE_ADDRESS="%{::ipaddress}" \
  -e INSTALL_DASHBOARD=false \
  puppet/kubetool

Using kubectl, I can reach my Kubernetes API server without issue. However, Kubelet is not able to reach it. Occasionally, it will complain that the certificate was not signed for the IP 10.2.3.7, only for or 10.96.0.1.

Where is this 10.96.0.1 coming from? Shouldn't this be equal to my master's internal IP address, e.g. 10.2.3.7?

Also, are all kubelets now directed to a single master node? What happens if that node fails?

itilitymaarten commented 6 years ago

Okay so I've figured out that 10.96.0.1 is the ClusterIP used by the kubernetes server, so I assume the idea is that other pods try to find kubernetes at this ip address and therefore the certificate has to be valid for this IP?

I've managed to get my cluster working, but only by using the public IP address for the BOOTSTRAP_CONTROLLER_IP. I end up with the following command for kubetool:

docker run --rm -v "$(pwd)":/mnt \
  -e OS=debian \
  -e VERSION=1.9.2 \
  -e CONTAINER_RUNTIME=docker \
  -e CNI_PROVIDER=weave \
  -e FQDN=icc-kubernetes-masters.westeurope.cloudapp.azure.com \
  -e IP=<PUBLIC IP> \
  -e BOOTSTRAP_CONTROLLER_IP=<PUBLIC IP> \
  -e ETCD_INITIAL_CLUSTER="etcd-kubernetes-01=http://%{::ipaddress}:2380" \
  -e ETCD_IP="%{::ipaddress}" \
  -e KUBE_API_ADVERTISE_ADDRESS="%{::ipaddress}" \
  -e INSTALL_DASHBOARD=false \
  puppet/kubetool

(note that I've also substituted the hardcoded IP address for the ETCD_INITIAL_CLUSTER parameter; it makes no sense to me to provide a manual IP address to me, if that IP address should need to be the same as the ETCD_IP anyway)

Is this the expected setup?

I find the documentation quite vague on which parameters require which values and what they mean, especially as a newcomer to Kubernetes. However, it doesn't make sense to me that nodes inside the cluster should need to use the public IP address of the cluster. This would require that I always expose the Kubernetes API to the outside world, even if I would want to control it only from inside the cluster.

scotty-c commented 6 years ago

@itilitymaarten Most of the issues that you have listed here are Kubernetes related and not really issues with the module. For example ETCD_INITIAL_CLUSTER =! ETCD_IP if you have multiple controllers being spun up for the initial cluster. As for the networking inside Kubernetes all the pods hit the kube api on the internal address 10.96.0.1 kubectl and outside services hit the node ip. Again this is the way Kubernetes works not the module. I am just trying to understand what the issue is on this thread with the module ?

scotty-c commented 6 years ago

@ianscrymgeour If you are new to Kubernetes I would start here https://github.com/puppetlabs/kream This is a project that allows you to play with the module so you can understand all the moving parts

itilitymaarten commented 6 years ago

@scotty-c thank you for your responses. I had seen Kream, but since I don't have a physical machine with Linux (or macOS) available to me, I instead chose/have to work with VMs hosted in Azure.

I think my biggest issue is understanding the documentation and the assumptions made by this module, especially surrounding the arguments for kubetool. IP isn't mentioned at all in the readme, but I think I read in an issue on this repository that it should be the IP address the cluster is available at to the internet. FQDN's description I find a little vague; I assume this is the FQDN at which the cluster is available externally (e.g. an FQDN going to a load balancer), but I'd prefer to see this made explicit, that way I don't have to guess. BOOTSTRAP_CONTROLLER_IP, to me, automatically becomes the IP address of the machine in the cluster that is going to be the bootstrap controller (seeing as IP is external). These two are not the same, because of the load balancer. However, when I set this to my master's internal IP, the kubelet on the master is unable to connect, since the internal IP address doesn't match the ones in the certificate (which seem to be IP and 10.96.0.1). ETCD_INITIAL_CLUSTER could use some clarification on the "etcd name" that is automatically being generated, i.e. etcd-<hostname>. I had to look at the template for etcd.yaml to figure out that this was the name being generated.

Perhaps a small paragraph describing an example HA setup (e.g. 3 master nodes, one bootstrap controllers, 2 controllers, and a loadbalancer machine facing the outside web) would also help to explain the situation that this module expects/creates.

All in all they're not big points, but especially when someone isn't that experience with Kubernetes yet (like I am), it would help a great deal in understanding what's happening. And again, my apologies if this is just due to lack of Kubernetes knowledge.

scotty-c commented 6 years ago

@itilitymaarten I will take this on board and create a ticket to work on the docs. In regards to the ip in kream we use an internal ip 172.17.10.101 and it works as we add it to the sans list. There could be 2 issues here 1) the internal ip of the compute resource is overlapping with the service ip range in Kubernetes 2) The internal ip is not in the san on the api servers cert.

More than happy to help out with the teething issues getting the module up and running

itilitymaarten commented 6 years ago

@scotty-c Thanks for your help then :) I'd be happy to review the changes to docs from a novice standpoint, if you like.

Assuming you're talking about the BOOTSTRAP_CONTROLLER_IP: I used an internal IP too, but it didn't get signed into the cert. The only IP's in the cert are the public IP of my master's loadbalancer, and the 10.96.0.1. However, I now believe that's correct. The IP should be from the master's loadbalancer, since otherwise my requests wouldn't be distributed across the masters.

Is there any preferred/required order to the puppet runs (if so, this would also be good to document)? Or should they all be run simultaneously? (for the masters, at least) The run on my bootstrap controller hangs indefinitely if the other controllers in ETCD_INITIAL_CLUSTER don't come up, due to etcd crash-looping (which takes down the API server with it). Once the other machines come up, they work fine, though. Last problem is that kube-dns doesn't actually start (pod is stuck on ContainerCreating, with the message "failed to create pod sandbox"), but I'll look into that further first.

scotty-c commented 6 years ago

So the etcd cluster needs to be established for the cluster to come up. So if you are creating a cluster of 3 controllers it would be best to run puppet at about the same time. Once the etcd cluster is ready the kube api server will be available.

What networking cni provider are you using? That will help me work out why kubedns is not starting.

itilitymaarten commented 6 years ago

I'm using weave, just the default arguments for kubetool (i.e. CNI_PROVIDER=weave), which turns into

kubernetes::cni_network_provider: https://git.io/weave-kube-1.6
kubernetes::cni_cluster_cidr: 
kubernetes::cni_node_cidr:

in my hiera data.

I've found the following in the logs:

(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/onlyif) NAME                   STATUS     ROLES     AGE       VERSION
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/onlyif) kubernetes-master-01   Ready      master    14m       v1.9.2
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/onlyif) kubernetes-master-02   NotReady   master    14m       v1.9.2
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/onlyif) kubernetes-master-03   NotReady   <none>    14m       v1.9.2
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) Exec try 1/5
(Exec[Install cni network provider](provider=posix)) Executing 'kubectl apply -f https://git.io/weave-kube-1.6'
Executing: 'kubectl apply -f https://git.io/weave-kube-1.6'
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) serviceaccount "weave-net" unchanged
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) clusterrole "weave-net" configured
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) clusterrolebinding "weave-net" configured
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) role "weave-net-kube-peer" unchanged
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) rolebinding "weave-net-kube-peer" unchanged
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) daemonset "weave-net" unchanged
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) executed successfully
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]) The container Class[Kubernetes::Kube_addons] will propagate my refresh event
(Exec[Assign master role to controller](provider=posix)) Executing check 'kubectl describe nodes kubernetes-master-01 | tr -s ' ' | grep 'Roles: master''
Executing: 'kubectl describe nodes kubernetes-master-01 | tr -s ' ' | grep 'Roles: master''
(/Stage[main]/Kubernetes::Kube_addons/Exec[Assign master role to controller]/unless) Roles: master
(Exec[Checking for dns to be deployed](provider=posix)) Executing check 'kubectl get deploy -n kube-system kube-dns -o yaml | tr -s " " | grep "Deployment does not have minimum availability"'
Executing: 'kubectl get deploy -n kube-system kube-dns -o yaml | tr -s " " | grep "Deployment does not have minimum availability"'
(/Stage[main]/Kubernetes::Kube_addons/Exec[Checking for dns to be deployed]/onlyif)  message: Deployment does not have minimum availability.
(/Stage[main]/Kubernetes::Kube_addons/Exec[Checking for dns to be deployed]/returns) Exec try 1/50
(Exec[Checking for dns to be deployed](provider=posix)) Executing 'kubectl get deploy -n kube-system kube-dns -o yaml | tr -s " " | grep "Deployment has minimum availability"'
Executing: 'kubectl get deploy -n kube-system kube-dns -o yaml | tr -s " " | grep "Deployment has minimum availability"'
(/Stage[main]/Kubernetes::Kube_addons/Exec[Checking for dns to be deployed]/returns) Sleeping for 10 seconds between tries
(/Stage[main]/Kubernetes::Kube_addons/Exec[Checking for dns to be deployed]/returns) Exec try 2/50

The "Checking for dns to be deployed" repeats until 50 and then determines it failed, obviously. The queer thing to me: should the bootstrap controller be showing that it's ready before dns is installed 'n ready? When using kubeadm, it never showed ready until everything was really up and running.

EDIT: events from the kube-dns pod:

Events:
  Type     Reason                  Age               From                           Message
  ----     ------                  ----              ----                           -------
  Normal   Scheduled               2m                default-scheduler              Successfully assigned kube-dns-ccf7b96b9-gsbsk to kubernetes-master-01
  Normal   SuccessfulMountVolume   2m                kubelet, kubernetes-master-01  MountVolume.SetUp succeeded for volume "kube-dns-token-d2ntl"
  Normal   SuccessfulMountVolume   2m                kubelet, kubernetes-master-01  MountVolume.SetUp succeeded for volume "kube-dns-config"
  Normal   SandboxChanged          2m (x11 over 2m)  kubelet, kubernetes-master-01  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  2m (x12 over 2m)  kubelet, kubernetes-master-01  Failed create pod sandbox.

2nd EDIT: I've got it to work, I think. Going to try with entirely clean VM's and verify. What I did:

I switched from using Weave to using Flannel. This didn't fix the issue, but might be part of the solution. I'll try to switch back to Weave once I've verified that the cluster can now be started successfully.
I disabled taints of the master nodes. I'd prefer to keep the tainted, but some of the errors that I've seen (I can't show them anymore, sorry) were showing that the kube-dns pod wasn't getting scheduled on the master nodes, because of the taint. After removing the taint, the puppet runs on the cluster is much faster, and works.

scotty-c commented 6 years ago

At the time do you have any nodes? what does the output of kubectl get pods --all-namespaces show you ?

itilitymaarten commented 6 years ago

I had no nodes in the cluster, just the 3 masters. I don't have the actual output of kubectl get pods --all-namespaces, but it showed all pods running except for the kube-dns pod, which showed 0/3 with ContainerCreating.

scotty-c commented 6 years ago

So i think you have no nodes to schedule on. We can test this by adding kubernetes::taint_master: false to hiera or add a worker node. Both will give you the same outcome.

itilitymaarten commented 6 years ago

This is what I already did; when I add that, everything works. But shouldn't the masters show that they're ready before we add any nodes? That's what I expected... perhaps this is still some Kubernetes knowledge missing on my side :)

scotty-c commented 6 years ago

So from Kubernetes point of view, the cluster is ready, it will take your commands and queue them up until a worker node is available. By default, Kubernetes won't deploy worker tasks to controllers. This is even documented in the kubeadm docs https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#master-isolation

itilitymaarten commented 6 years ago

Yeah I know that, but I thought kube-dns has to run on the masters too. Additionally, I figured that since two of the masters were showing they weren't ready yet, the cluster wouldn't be in a ready state. I will try again with the taint, but then including a worker node as well. Assuming that works well, I have my cluster running :)

The last thing I'm wondering about: how much of my struggles could be reflected in the docs of this repo, to help other people? For example, a diagram of a minimal cluster setup (say 3 masters and 2 nodes, with the appropriate load balancers?), perhaps even with an indication of where the important parameters for kubetool come from? (such as IP and BOOTSTRAP_CONTROLLER_IP)

Again, thanks for your help; if you feel no further additions to the docs are necessary (or that this is not the right issue to put them under), feel free to close this issue :)

scotty-c commented 6 years ago

@itilitymaarten We are a ticket in the current sprint to update the documentation that will make the next release of the module.

davejrt commented 6 years ago

All documentation has been updated with the release of v 2.0.0

puppetlabs / puppetlabs-kubernetes

Cannot use API server certificate for both external kubectl and internal kubelet #70