kelseyhightower / kubernetes-the-hard-way

Bootstrap Kubernetes the hard way. No scripts.
Apache License 2.0
41.3k stars 14.13k forks source link

Admission webhooks and API-server (controller) networking #588

Closed mikalai-t closed 4 years ago

mikalai-t commented 4 years ago

Thanks everyone who participated in creation and even posted an issue here for the great job. This tutorial helped to make k8s a bit friendly for me ))

I wrote my own scenario, based on this tutorial when Terraform deploys all the GCP infrastructure (VPC, subnet, routers, NAT, firewall rules, instances etc.), then infrastructure parameters is sent to startup scripts and Ansible playbooks deploy all the necessary components depending on node type.

I've tested with different OS (debian-9, debian-10) and several Kubernetes releases (starting from 1.15.x to 1.18.x).

Basic operations and deployments works like a charm, but now I'm experiencing issue deploying Nginx Ingress Controller from here: kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/baremetal/deploy.yaml Particularly a part related to ValidatingWebhook produces an error when I tried to deploy Ingress rule for my application:

Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=10s: dial tcp 10.32.0.6:443: i/o timeout"

Honestly, I'm newbie in K8S, correct me if I'm wrong somewhere... As I understand how Validation Admission works - the API server must send Ingress manifests via the network to the Ingress Controller, running as a Pod on the worker node, so it can check them before applying. And the error above shows that API server couldn't reach the Service!?

I read this article: https://itnext.io/kubernetes-networking-behind-the-scenes-39a1ab1792bb , but it still isn't clear for me - how API server (deployed as SystemD service) is supposed to send a network packet to the 10.32.0.0/24 CIDR in case of validating something, for example, using Admission Webhook?

May be we need to add more routes into controller node to allow reach 10.32.0.0/24 via somewhat next-hop? How then API server should resolve Service name ingress-nginx-controller-admission.ingress-nginx.svc as it's only known by "asking" CoreDNS, running inside the Cluster?

Firewall rules in GCP network allows any tcp, udp and icmp traffic from 10.240.0.0/24, 10.200.0.0/16, 10.32.0.0/24 CIDRs. Also I was able to send the same POST request using curl from another Pod to the Service endpoint ingress-nginx-controller-admission.ingress-nginx.svc and immediately received the answer.

Could anyone assist or suggest anything, please, or just provide a link to similar article that may help to understand K8S networking even better?

mikalai-t commented 4 years ago

Well, I checked iptables on the workers and found out that anyone is able to reach ClusterIP Service. So I added a route in GCP to Service CIDR 10.32.0.0/24 via next-hops equals to all workers IP and webhook started working. Not sure if this is a correct way though...

vCillusion commented 3 years ago

Hi @mikalai-t I faced the same issue in my local cluster setup on Virtual Box, can you please guide me the steps cc: @kelseyhightower

mikalai-t commented 3 years ago

@vCillusion Emm, no idea about your local setup :) In order to get a *Webhook working make sure your "master(s)" can reach a "worker" node. Possible reasons iptables firewall/config and/or routing tables