kontena / pharos-cluster

Pharos - The Kubernetes Distribution
https://k8spharos.dev/
Apache License 2.0
312 stars 40 forks source link

External api endpoint install failure #380

Open giovannicandido opened 6 years ago

giovannicandido commented 6 years ago

I'm creating a new cluster with api endpoint as by the docs.

Setup gives the error:

SocketError : Failed to open TCP connection to xxx.xxxx.xxx:6443

Documentation gaves no clue if I have to setup this endpoint to masters before running the installation.

Cluster is up with no nodes.

I was expecting this to be just a external nickname pointing to the masters (loadbalancer or dns) not something used internally, specially the setup itself.

SpComb commented 6 years ago

The api.endpoint is also used for all kube API requests by the pharos-cluster CLI itself, once all the master hosts have been configured - in addition to being a subjectAltName on the master kube API certificates.

It should point to a LB/DNS that is provisioned at the same time as the machines themselves - this should be clarified in the docs.

giovannicandido commented 6 years ago

Can be list? The docs in: https://kubernetes.io/docs/concepts/cluster-administration/certificates say is a list of alternative names. My use case is that all traffic is internal in a VPN network but only ports 6443, http and https is acessible by external traffic. I plan to use one local dns and one external pointing to different ips.

giovannicandido commented 6 years ago

Interesting, first I setup with the api it gives the error, then I run again without it, cluster has been setup. Then I create a /etc/hosts in the machine running pharos cli to bypass the setup and run again with api. Cluster is now unstable and I'm not able to create pods. Investigation leads to:

` kubectl -n kube-system describe pod pharos-proxy-k8s-worker0

Warning FailedMount 3h (x2 over 3h) kubelet, k8s-master0 MountVolume.SetUp failed for volume "kube-proxy" : Get https://localhost:6443/api/v1/namespaces/kube-system/configmaps/kube-proxy: net/http: TLS handshake timeout `

Which means certs are wrong.

Also:

` kubectl -n kube-system get pods

NAME READY STATUS RESTARTS AGE etcd-k8s-master-ams0 0/1 CrashLoopBackOff 33 13m etcd-k8s-master0 1/1 Running 12 5h etcd-k8s-master1 1/1 Running 9 5h heapster-558c7ddb57-s5658 0/1 ContainerCreating 0 4h kube-apiserver-k8s-master-ams0 1/1 Running 45 13m kube-apiserver-k8s-master0 1/1 Running 6 3h kube-apiserver-k8s-master1 1/1 Running 1 3h kube-controller-manager-k8s-master-ams0 1/1 Running 0 13m kube-controller-manager-k8s-master0 1/1 Running 9 3h kube-controller-manager-k8s-master1 1/1 Running 12 3h kube-dns-6475cffd7f-4gv4v 3/3 Running 0 2h kube-dns-6475cffd7f-8zxbm 0/3 CrashLoopBackOff 239 4h kube-proxy-54pqq 1/1 Running 0 5h kube-proxy-7mgbz 1/1 Running 0 5h kube-proxy-cct6n 1/1 Running 0 5h kube-proxy-dp47c 0/1 CrashLoopBackOff 58 4h kube-proxy-tb68k 0/1 CrashLoopBackOff 59 4h kube-scheduler-k8s-master-ams0 1/1 Running 0 13m kube-scheduler-k8s-master0 1/1 Running 15 5h kube-scheduler-k8s-master1 1/1 Running 18 5h pharos-proxy-k8s-worker0 1/1 Running 0 4h pharos-proxy-k8s-worker1 1/1 Running 0 4h weave-net-44hvp 0/2 CrashLoopBackOff 112 4h weave-net-qn4qv 2/2 Running 0 4h weave-net-rng6s 2/2 Running 0 4h weave-net-spl4x 2/2 Running 0 4h weave-net-zlfxd 0/2 CrashLoopBackOff 110 4h `

Crucial system pods are crashed.

I will complete rebuild cluster (easier for me)

giovannicandido commented 6 years ago

I'm having some problems with pharos installation and this config. When I try to create a cluster with external API it crash after installation, cluster is slow and buggy, kube-proxy and etcd loop crash. Even with DNS records. If I install without it, everything works fine. Problem is I can't use kubectl outside the private network x509 certificate do not allow access thought api, which makes the cluster kind of useless. First I thought it was a firewall issue, I try with firewall disabled. The final thing I will try is to use the external ip address for systems, but that means all nodes need a dedicated IP, and HA access to kubeapi will be trick (need to point to ip address not DNS or loadbalancer). What I'm trying to achieve is cluster isolated in internal network, while kubeapi is external . I don't have a IP loadbalancer, but even with it, I don't think it will make a difference in this problem.

SpComb commented 6 years ago

The api.endpoint should only be used by the pharos-cluster tool itself, and not for the worker host kubelet -> master host kube-apiserver connections, so the scenario you describe should work, assuming that the external kubeapi DNS/LB is already setup before you run pharos-cluster up.

The worker host kubelet -> master host kube-apiserver connections go via https://localhost:6443 => the local pharos-proxy pod => the KUBE_MASTERS="..." using the master host private/public IPs.

kubectl -n kube-system describe pod pharos-proxy-k8s-worker0

Warning FailedMount 3h (x2 over 3h) kubelet, k8s-master0 MountVolume.SetUp failed for volume "kube-proxy" : Get https://localhost:6443/api/v1/namespaces/kube-system/configmaps/kube-proxy: net/http: TLS handshake timeout

Confusion - you're inspecting the pharos-proxy pod, but the warning is for the kube-proxy pod?

Based on the kubectl get pods output, it looks like you have a cluster with three master nodes and two worker nodes, and the kube-dns and weave-net pods on the two worker nodes are crashing with errors communicating to https://localhost:6443?

That would point at a problem with the pharos-proxy on those worker nodes... could you perhaps show the kubectl -n kube-system logs pharos-proxy-k8s-worker0 and kubectl -n kube-system logs kube-proxy-* for the pods on the worker nodes?

jakolehm commented 6 years ago

@giovannicandido could you paste an example cluster.yml that describes your setup?

giovannicandido commented 6 years ago
hosts:
  - address: "172.28.250.121"
    private_interface: zt5u4y6ejv
    user: root
    role: master
  - address: "172.28.240.178"
    private_interface: zt5u4y6ejv
    user: root
    role: master
  - address: "172.28.161.33"
    private_interface: zt5u4y6ejv
    user: root
    role: master
  - address: "172.28.135.89"
    role: worker
    private_interface: zt5u4y6ejv
    user: root
  - address: "172.28.94.143"
    role: worker
    private_interface: zt5u4y6ejv
    user: root
network:
  provider: weave
  service_cidr: 172.31.0.0/16
  pod_network_cidr: 172.32.0.0/16
  weave:
    trusted_subnets:
      - "172.28.0.0/16"
api:
  endpoint: k8s.xxx.xxx

k8s.xxx.xxx points to masters valid ip address. Each system has a invalid ip address in eth0 and a vpn interface zt5u4y6ejv. The valid ip address is a Transparent (all ports redirection) NAT

jakolehm commented 6 years ago

What kind of vpn network is behind zt5u4y6ejv interfaces?

giovannicandido commented 6 years ago

https://www.zerotier.com/

Is a peer to peer mesh network

jakolehm commented 6 years ago

Not sure why would you need an additional p2p mesh network (weave is already a p2p mesh network)? This probably causes slowness & random network failures because you have two mesh network layers and weave probably also tries to create ipsec tunnels between peers.

Is zerotier mandatory here or would plain weave network be enough?

giovannicandido commented 6 years ago

Using trusted_subnets, if is passed to weave, should not re-encrypt traffic. The cloud provider doesn't have a SDN network, and shares the private ip space with other customers. Zerotier is not mandatory, but high compelling. It acts like a VPN where clients (admins) could connect to the same network as if is local (OpenVPN like) without a single point of failure and with good performance (no central routing). It also can provide datacenter agnostic migration, the ip address can be just migrated to another datacenter without downtime. I will experiment other approaches but, it works well. I will comment the results.

jakolehm commented 6 years ago

In this (kinda exotic) case you probably need to define zerotier network as a "trusted subnet". Otherwise weave will encrypt traffic within zerotier using IPSEC. Other option is to use calico with zerotier.. but that limits your hardware options to amd64.

giovannicandido commented 6 years ago

I think I found the real source of problem: NAT As I said each system is behind a traversal NAT and doens't have real IP attached. The DNS record needs to point to valid ip address. But looks like (I'm guessing here) the certificates are bound to the name + ip. Since ip in network card is not the same as DNS that is the source of problem.

I came to that conclusion after successfully creating a cluster with exact same configuration, but in a different provider. The new systems has IP attached.

That case is not so exotic if you think on enterprises building clusters in house. IPV4 is gone for good and unfortunately NAT is one of the tools to overcome the short offer, when the API server needs to be exposed for remote access, the only alternative is to use VPN and reference the private IP. The other limitation is that fully HA is trick because the client will only connect to a specific ip (loadbalancer will no work here kubectl client will refuse connections)

The good news is that this limitation is bound only to the master nodes. Since 3 - 5 nodes are enough for the vast majority of needs the number of ip address in this case is manageable. We also have the alternative to lock down access exclusive thought the VPN and expose only ingress or other traffic, which is not bad.

What I will do: Change provider from scaleway to hetzner cloud. That decision is not only because of the problem, but hetzner is on the same price range with some features I appreciate (ceph storage for HA, snapshots)

Let me know if you guys need more information in order to document that behavior, or if you decide to improve something about the cert generation (or other cause I'm not seeing)

Thank you all for the support here ;-)