Closed itilitymaarten closed 6 years ago
Okay so I've figured out that 10.96.0.1 is the ClusterIP used by the kubernetes server, so I assume the idea is that other pods try to find kubernetes at this ip address and therefore the certificate has to be valid for this IP?
I've managed to get my cluster working, but only by using the public IP address for the BOOTSTRAP_CONTROLLER_IP
. I end up with the following command for kubetool:
docker run --rm -v "$(pwd)":/mnt \
-e OS=debian \
-e VERSION=1.9.2 \
-e CONTAINER_RUNTIME=docker \
-e CNI_PROVIDER=weave \
-e FQDN=icc-kubernetes-masters.westeurope.cloudapp.azure.com \
-e IP=<PUBLIC IP> \
-e BOOTSTRAP_CONTROLLER_IP=<PUBLIC IP> \
-e ETCD_INITIAL_CLUSTER="etcd-kubernetes-01=http://%{::ipaddress}:2380" \
-e ETCD_IP="%{::ipaddress}" \
-e KUBE_API_ADVERTISE_ADDRESS="%{::ipaddress}" \
-e INSTALL_DASHBOARD=false \
puppet/kubetool
(note that I've also substituted the hardcoded IP address for the ETCD_INITIAL_CLUSTER
parameter; it makes no sense to me to provide a manual IP address to me, if that IP address should need to be the same as the ETCD_IP
anyway)
Is this the expected setup?
I find the documentation quite vague on which parameters require which values and what they mean, especially as a newcomer to Kubernetes. However, it doesn't make sense to me that nodes inside the cluster should need to use the public IP address of the cluster. This would require that I always expose the Kubernetes API to the outside world, even if I would want to control it only from inside the cluster.
@itilitymaarten Most of the issues that you have listed here are Kubernetes related and not really issues with the module. For example ETCD_INITIAL_CLUSTER
=! ETCD_IP
if you have multiple controllers being spun up for the initial cluster. As for the networking inside Kubernetes all the pods hit the kube api on the internal address 10.96.0.1
kubectl
and outside services hit the node ip. Again this is the way Kubernetes works not the module. I am just trying to understand what the issue is on this thread with the module ?
@ianscrymgeour If you are new to Kubernetes I would start here https://github.com/puppetlabs/kream This is a project that allows you to play with the module so you can understand all the moving parts
@scotty-c thank you for your responses. I had seen Kream, but since I don't have a physical machine with Linux (or macOS) available to me, I instead chose/have to work with VMs hosted in Azure.
I think my biggest issue is understanding the documentation and the assumptions made by this module, especially surrounding the arguments for kubetool.
IP
isn't mentioned at all in the readme, but I think I read in an issue on this repository that it should be the IP address the cluster is available at to the internet.
FQDN
's description I find a little vague; I assume this is the FQDN at which the cluster is available externally (e.g. an FQDN going to a load balancer), but I'd prefer to see this made explicit, that way I don't have to guess.
BOOTSTRAP_CONTROLLER_IP
, to me, automatically becomes the IP address of the machine in the cluster that is going to be the bootstrap controller (seeing as IP
is external). These two are not the same, because of the load balancer. However, when I set this to my master's internal IP, the kubelet on the master is unable to connect, since the internal IP address doesn't match the ones in the certificate (which seem to be IP
and 10.96.0.1
).
ETCD_INITIAL_CLUSTER
could use some clarification on the "etcd name" that is automatically being generated, i.e. etcd-<hostname>
. I had to look at the template for etcd.yaml to figure out that this was the name being generated.
Perhaps a small paragraph describing an example HA setup (e.g. 3 master nodes, one bootstrap controllers, 2 controllers, and a loadbalancer machine facing the outside web) would also help to explain the situation that this module expects/creates.
All in all they're not big points, but especially when someone isn't that experience with Kubernetes yet (like I am), it would help a great deal in understanding what's happening. And again, my apologies if this is just due to lack of Kubernetes knowledge.
@itilitymaarten I will take this on board and create a ticket to work on the docs. In regards to the ip in kream we use an internal ip 172.17.10.101
and it works as we add it to the sans list. There could be 2 issues here
1) the internal ip of the compute resource is overlapping with the service ip range in Kubernetes
2) The internal ip is not in the san on the api servers cert.
More than happy to help out with the teething issues getting the module up and running
@scotty-c Thanks for your help then :) I'd be happy to review the changes to docs from a novice standpoint, if you like.
Assuming you're talking about the BOOTSTRAP_CONTROLLER_IP
: I used an internal IP too, but it didn't get signed into the cert. The only IP's in the cert are the public IP of my master's loadbalancer, and the 10.96.0.1
. However, I now believe that's correct. The IP should be from the master's loadbalancer, since otherwise my requests wouldn't be distributed across the masters.
Is there any preferred/required order to the puppet runs (if so, this would also be good to document)? Or should they all be run simultaneously? (for the masters, at least) The run on my bootstrap controller hangs indefinitely if the other controllers in ETCD_INITIAL_CLUSTER
don't come up, due to etcd crash-looping (which takes down the API server with it). Once the other machines come up, they work fine, though. Last problem is that kube-dns doesn't actually start (pod is stuck on ContainerCreating, with the message "failed to create pod sandbox"), but I'll look into that further first.
So the etcd cluster needs to be established for the cluster to come up. So if you are creating a cluster of 3 controllers it would be best to run puppet at about the same time. Once the etcd cluster is ready the kube api server will be available.
What networking cni provider are you using? That will help me work out why kubedns is not starting.
I'm using weave, just the default arguments for kubetool (i.e. CNI_PROVIDER=weave
), which turns into
kubernetes::cni_network_provider: https://git.io/weave-kube-1.6
kubernetes::cni_cluster_cidr:
kubernetes::cni_node_cidr:
in my hiera data.
I've found the following in the logs:
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/onlyif) NAME STATUS ROLES AGE VERSION
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/onlyif) kubernetes-master-01 Ready master 14m v1.9.2
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/onlyif) kubernetes-master-02 NotReady master 14m v1.9.2
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/onlyif) kubernetes-master-03 NotReady <none> 14m v1.9.2
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) Exec try 1/5
(Exec[Install cni network provider](provider=posix)) Executing 'kubectl apply -f https://git.io/weave-kube-1.6'
Executing: 'kubectl apply -f https://git.io/weave-kube-1.6'
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) serviceaccount "weave-net" unchanged
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) clusterrole "weave-net" configured
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) clusterrolebinding "weave-net" configured
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) role "weave-net-kube-peer" unchanged
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) rolebinding "weave-net-kube-peer" unchanged
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) daemonset "weave-net" unchanged
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]/returns) executed successfully
(/Stage[main]/Kubernetes::Kube_addons/Exec[Install cni network provider]) The container Class[Kubernetes::Kube_addons] will propagate my refresh event
(Exec[Assign master role to controller](provider=posix)) Executing check 'kubectl describe nodes kubernetes-master-01 | tr -s ' ' | grep 'Roles: master''
Executing: 'kubectl describe nodes kubernetes-master-01 | tr -s ' ' | grep 'Roles: master''
(/Stage[main]/Kubernetes::Kube_addons/Exec[Assign master role to controller]/unless) Roles: master
(Exec[Checking for dns to be deployed](provider=posix)) Executing check 'kubectl get deploy -n kube-system kube-dns -o yaml | tr -s " " | grep "Deployment does not have minimum availability"'
Executing: 'kubectl get deploy -n kube-system kube-dns -o yaml | tr -s " " | grep "Deployment does not have minimum availability"'
(/Stage[main]/Kubernetes::Kube_addons/Exec[Checking for dns to be deployed]/onlyif) message: Deployment does not have minimum availability.
(/Stage[main]/Kubernetes::Kube_addons/Exec[Checking for dns to be deployed]/returns) Exec try 1/50
(Exec[Checking for dns to be deployed](provider=posix)) Executing 'kubectl get deploy -n kube-system kube-dns -o yaml | tr -s " " | grep "Deployment has minimum availability"'
Executing: 'kubectl get deploy -n kube-system kube-dns -o yaml | tr -s " " | grep "Deployment has minimum availability"'
(/Stage[main]/Kubernetes::Kube_addons/Exec[Checking for dns to be deployed]/returns) Sleeping for 10 seconds between tries
(/Stage[main]/Kubernetes::Kube_addons/Exec[Checking for dns to be deployed]/returns) Exec try 2/50
The "Checking for dns to be deployed" repeats until 50 and then determines it failed, obviously.
The queer thing to me: should the bootstrap controller be showing that it's ready before dns is installed 'n ready? When using kubeadm
, it never showed ready until everything was really up and running.
EDIT: events from the kube-dns pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m default-scheduler Successfully assigned kube-dns-ccf7b96b9-gsbsk to kubernetes-master-01
Normal SuccessfulMountVolume 2m kubelet, kubernetes-master-01 MountVolume.SetUp succeeded for volume "kube-dns-token-d2ntl"
Normal SuccessfulMountVolume 2m kubelet, kubernetes-master-01 MountVolume.SetUp succeeded for volume "kube-dns-config"
Normal SandboxChanged 2m (x11 over 2m) kubelet, kubernetes-master-01 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 2m (x12 over 2m) kubelet, kubernetes-master-01 Failed create pod sandbox.
2nd EDIT: I've got it to work, I think. Going to try with entirely clean VM's and verify. What I did:
At the time do you have any nodes? what does the output of kubectl get pods --all-namespaces
show you ?
I had no nodes in the cluster, just the 3 masters. I don't have the actual output of kubectl get pods --all-namespaces
, but it showed all pods running except for the kube-dns pod, which showed 0/3 with ContainerCreating
.
So i think you have no nodes to schedule on. We can test this by adding kubernetes::taint_master: false
to hiera or add a worker node. Both will give you the same outcome.
This is what I already did; when I add that, everything works. But shouldn't the masters show that they're ready before we add any nodes? That's what I expected... perhaps this is still some Kubernetes knowledge missing on my side :)
So from Kubernetes point of view, the cluster is ready, it will take your commands and queue them up until a worker node is available. By default, Kubernetes won't deploy worker tasks to controllers. This is even documented in the kubeadm docs https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#master-isolation
Yeah I know that, but I thought kube-dns has to run on the masters too. Additionally, I figured that since two of the masters were showing they weren't ready yet, the cluster wouldn't be in a ready state. I will try again with the taint, but then including a worker node as well. Assuming that works well, I have my cluster running :)
The last thing I'm wondering about: how much of my struggles could be reflected in the docs of this repo, to help other people? For example, a diagram of a minimal cluster setup (say 3 masters and 2 nodes, with the appropriate load balancers?), perhaps even with an indication of where the important parameters for kubetool come from? (such as IP
and BOOTSTRAP_CONTROLLER_IP
)
Again, thanks for your help; if you feel no further additions to the docs are necessary (or that this is not the right issue to put them under), feel free to close this issue :)
@itilitymaarten We are a ticket in the current sprint to update the documentation that will make the next release of the module.
All documentation has been updated with the release of v 2.0.0
I am using kubetool to generate my hieradata for me, including of coures all certificates:
Using kubectl, I can reach my Kubernetes API server without issue. However, Kubelet is not able to reach it. Occasionally, it will complain that the certificate was not signed for the IP 10.2.3.7, only for or 10.96.0.1.
Where is this 10.96.0.1 coming from? Shouldn't this be equal to my master's internal IP address, e.g. 10.2.3.7?
Also, are all kubelets now directed to a single master node? What happens if that node fails?