Open BloodyIron opened 1 month ago
Also this is what it looks like when I curl localhost:6443 on one of the nodes, at the Ubuntu OS level:
curl --insecure https://localhost:6443 { "kind": "Status", "apiVersion": "v1", "metadata": {}, "status": "Failure", "message": "Unauthorized", "reason": "Unauthorized", "code": 401 }
So I don't really believe any firewall aspects are blocking 6443 on the nodes.
but calico-kube-controllers (the name of a singular pod in this case)
Just as a side-note, this is because the single pod contains multiple distinct controllers running within it :smile:
I think the main issue here is that "localhost" resolves to a different address for different pods. calico/node
and calico/typha
run in the host network namespace, so localhost
for them will resolve to the actual node's localhost.
However, kube-controllers runs with it's own network namespace (i.e., hostNetwork false
) and so localhost
within that pod will resolve to the pod's local IP, and traffic won't go to the node itself.
Do you have some sort of proxy running on localhost? Or is this just a single node cluster with the API server running on the node? In the latter case, you probably can just use the real IP of the node hosting the API server instead of using localhost.
but calico-kube-controllers (the name of a singular pod in this case)
Just as a side-note, this is because the single pod contains multiple distinct controllers running within it 😄
I think the main issue here is that "localhost" resolves to a different address for different pods.
calico/node
andcalico/typha
run in the host network namespace, solocalhost
for them will resolve to the actual node's localhost.However, kube-controllers runs with it's own network namespace (i.e.,
hostNetwork false
) and solocalhost
within that pod will resolve to the pod's local IP, and traffic won't go to the node itself.Do you have some sort of proxy running on localhost? Or is this just a single node cluster with the API server running on the node? In the latter case, you probably can just use the real IP of the node hosting the API server instead of using localhost.
I'm going to do my best to answer these and future questions, but there may be areas where I think I know what's going on, but might not be 100% accurate. I am still pretty green behind the ears for k8s, and I am using Rancher (v2.7.6) to provision RKE2 nodes for this test cluster, just to be clear. So feel free to correct me wherever you see fit :) So much more to learn! But...
I used localhost because it seemed to get me the "farthest" vs the actual VIP for Kubernetes API
Gotcha, yep this makes sense but will break down even more once you add nodes that are worker-only (and in fact, breaks down even before that as evidenced by this issue!).
So far as I KNOW I do not have any proxy on local host
Yep, I think my question was answered - if you haven't set up anything explicitly to redirect localhost:6443->apiserver:6443 on another host, which is sounds like you haven't.
believe I saw api-server pods on all 3x of the nodes in this cluster, but I don't know yet if that will be the case for any worker-only nodes I add in the future
They won't be - the apiserver is a control-plane only component and won't run on worker nodes. However, services on the worker nodes (including kube-proxy when enabled, kubelet, and Calico) all need to communicate with the apiserver.
You might have missed me mentioning above that I've tried with the VIP for Kubernetes API
Yeah, I am not too surprised this doesn't work. Typically I would expect to point KUBERNETES_SERVICE_HOST
to the address of a load balancer fronting the API server (I suspect you have one, e.g., the address your local kubectl uses to reach the apiservers?)
You can use one of the api server endpoints in kubectl get endpoints kubernetes
You can use one of the api server endpoints in kubectl get endpoints kubernetes
Yep, this is a decent stopgap but it won't provide redundancy in the event of that particular API server pod failing / being upgraded, nor will it handle the IP address of that particular node changing.
I used localhost because it seemed to get me the "farthest" vs the actual VIP for Kubernetes API
Gotcha, yep this makes sense but will break down even more once you add nodes that are worker-only (and in fact, breaks down even before that as evidenced by this issue!).
So far as I KNOW I do not have any proxy on local host
Yep, I think my question was answered - if you haven't set up anything explicitly to redirect localhost:6443->apiserver:6443 on another host, which is sounds like you haven't.
believe I saw api-server pods on all 3x of the nodes in this cluster, but I don't know yet if that will be the case for any worker-only nodes I add in the future
They won't be - the apiserver is a control-plane only component and won't run on worker nodes. However, services on the worker nodes (including kube-proxy when enabled, kubelet, and Calico) all need to communicate with the apiserver.
You might have missed me mentioning above that I've tried with the VIP for Kubernetes API
Yeah, I am not too surprised this doesn't work. Typically I would expect to point
KUBERNETES_SERVICE_HOST
to the address of a load balancer fronting the API server (I suspect you have one, e.g., the address your local kubectl uses to reach the apiservers?)
Right now all 3x of the nodes in the cluster are etcd/control-plane/worker though. Just to clarify ;)
You can use one of the api server endpoints in
kubectl get endpoints kubernetes
How do I:
(Directed at anyone:) To me, the point of the 10.43.0.1 VIP... wasn't it supposed to be the logical "loadbalanced" IP meeting this function of tolerant of endpoint shenanigans? Which is why it seemed the logical IP to use, yet... doesn't work? (and I know this is a default-generated IP for my cluster, I'm not married to this specific IP but automation is nice)
To me, the point of the 10.43.0.1 VIP... wasn't it supposed to be the logical "loadbalanced" IP meeting this function of tolerant of endpoint shenanigans?
It is. However, when using eBPF Calico, Calico becomes responsible for programming that VIP so that it works. So, Calico can't rely on the VIP that it itself is programming.
Where are you running kubectl from? Are you accessing this cluster by SSHing into the nodes? Typically a cluster will have a public IP address associated with its API that is used for external access (i.e., a cloud LoadBalancer) - this should handle nodes being added/removed, as well as rolling updates, etc.
To me, the point of the 10.43.0.1 VIP... wasn't it supposed to be the logical "loadbalanced" IP meeting this function of tolerant of endpoint shenanigans?
It is. However, when using eBPF Calico, Calico becomes responsible for programming that VIP so that it works. So, Calico can't rely on the VIP that it itself is programming.
Where are you running kubectl from? Are you accessing this cluster by SSHing into the nodes? Typically a cluster will have a public IP address associated with its API that is used for external access (i.e., a cloud LoadBalancer) - this should handle nodes being added/removed, as well as rolling updates, etc.
I'm generally avoiding any sort of manual kubectl at all. I always try to seek a method that ArgoCD can apply via YAML manifests so that any changes I want to make are defined IaC-style. So disabling kube-proxy, for example, uses a daemonset that drops a yaml file in a location RKE2 looks for, which declares an Environment Variable to disable kube-proxy, and also removes another manifest file for kube-proxy in another folder.
As for the "name: kubernetes-services-endpoint" that's a "kind: ConfigMap" manifest that ArgoCD applies to the cluster.
ArgoCD compares running state to a GitLab repo, by the way.
I try to avoid getting manually onto the nodes, but for when I have to, it's via SSH. Or for kubectl stuff I use the Rancher's webGUI to get me to kubectl for the cluster (and I don't typically need to care where kubectl runs, so long as it can reach the relevant cluster, which it normally can).
This is all self-hosted by the way, and I do believe I have things configured to go through Rancher for when ArgoCD interacts with the cluster. I intentionally did not configure an endpoint (unsure if this is Kubernetes API or not, to be clear) as I wanted Rancher to manage that Access Control/RBAC stuff.
As for inbound traffic, I'm using MetalLB in Layer 2 ARP mode handling a single LAN IP for inbound traffic, but I'm quite confident that doesn't overlap with where I'm stuck.
So again, all on-prem, my infra, no hosted cloud, nothing like that. ;) This is by design and I have no interest in doing any of this in any hosted infra cloud or otherwise.
I'm generally avoiding any sort of manual kubectl at all.
Right - I was less interested in kubectl
the tool in particular, and more interested in learning how entities outside of your cluster access the API (if at all).
Ultimately what you need is a stable IP address that routes to your API server pod(s). Given you are running your own on-prem cluster, it's sort of up to you how you configure that!
Or if you have a way how to resolve DNS from the hosts, you can use a domain name in KUBERNETES_SERVICE_HOST
that maps to multiple IPs for HA. I think Azure does it that way.
I'm generally avoiding any sort of manual kubectl at all.
Right - I was less interested in
kubectl
the tool in particular, and more interested in learning how entities outside of your cluster access the API (if at all).Ultimately what you need is a stable IP address that routes to your API server pod(s). Given you are running your own on-prem cluster, it's sort of up to you how you configure that!
Of course it's up top me, but I have no idea how I should configure it, as the official documentation for eBPF in Calico (specifically for RKE2 agnostic of where, by the way) isn't producing the results described in said documentation.
I do see that 10.43.0.1 exists in the cluster, but using that instead of localhost or 127.0.0.1 works "worse" as all Calico aspects fail to re-init when using 10.43.0.1.
So I really do need more help on the matter.
Or if you have a way how to resolve DNS from the hosts, you can use a domain name in
KUBERNETES_SERVICE_HOST
that maps to multiple IPs for HA. I think Azure does it that way.
I'm not opposed to something like that, however I would want some sort of method that auto-updates itself with adding/removing nodes (in this case control-plane) to said DNS entry. Shouldn't some of the internal FQDNs work for this function?
It's also challenging for me to determine if the relevant pods care about internal (within the k8s cluster) DNS resolution for this aspect, or require external (outside the cluster, but on the VM itself maybe) resolution for such a method to work?
I'm trying to keep the cluster as self-sustaining, and automated, as possible.
And to be clear to both @caseydavenport and @tomastigera I do REALLY appreciate the engagement and help here, even if we haven't quite found a solution yet. So thank you for that ❤️
Also, isn't Calico capable of doing this kube-apiserver loadbalancing internally or something that we "need" here? I'm trying to keep my details straight, but it is challenging...
Also, isn't Calico capable of doing this kube-apiserver loadbalancing internally
The problem here is bootstrapping - Calico needs to be able to talk to the apiserver in order to learn the necessary information in order to do this.
So, Calico can't just magically detect where the API server is when it doesn't have access to the API. Something needs to tell Calico where the API server is so that it can set up that load balancing for other pods.
Also, isn't Calico capable of doing this kube-apiserver loadbalancing internally
The problem here is bootstrapping - Calico needs to be able to talk to the apiserver in order to learn the necessary information in order to do this.
So, Calico can't just magically detect where the API server is when it doesn't have access to the API. Something needs to tell Calico where the API server is so that it can set up that load balancing for other pods.
And is this from an internal-to-cluster perspective, or external-to-cluster perspective? (as in where I "want" it connecting to).
I ask because a few things I've discovered may be relevant.
I understand that it's far more typical to do k8s stuff on hosted/cloud infra, but self-hosted is really a requirement for the areas I work in, and hosted/cloud really isn't an acceptable option. So in my development of my whole cluster adventure, this last bit with Calico is like the last 1% remaining on the work.
I really hope we can figure something out, because I do quite like what I see in Calico, and I really am trying to do my best to read lots, and listen lots. And again thanks for responding and responding so rapidly. :) I apologise in advance if I come across as unappreciative or gruff in any way, past, present, or future. I'm both frustrated, and excited to get this sorted. ❤️
Also... considering my circumstance... should I be enabling eBPF before (or in-parallel with) declaring the KUBERNETES_SERVICE_HOST and _PORT? (And what about disabling kube-proxy? the test above kube-proxy was running at the time)
Hey so some new information...
Reddit thread: https://www.reddit.com/r/kubernetes/comments/1epn6jo/best_approach_to_expose_an_onpremise_k8s_cluster/
BloodyIron's optimism grows.
I tried setting KUBERNETES_SERVICE_HOST to "kubernetes.default.svc.cluster.local"
DNS resolution using cluster DNS won't work until Calico is running - another part of the "catch 22" situation. I think the CNI plugin is using your node's DNS configuration, which is how it ended up hitting your gateway.
How is it when I use 127.0.0.1/localhost for KUBERNETES_SERVICE_HOST all the pods work (except calico-cube-controllers)? Is it because those working pods init from an external connection context?
calico-node
, calico-typha
and the CNI plugin all run on the host - localhost
resolves to the IP address of the API server running on that node. You should see this if you kubectl get pods -o wide
to view those pod's IP addresses - it should match the apiserver address.
calico-kube-controllers
does not run in the host's network namespace. localhost
resolves to its own IP address, which is not the same as the API server address.
And is this from an internal-to-cluster perspective, or external-to-cluster perspective? (as in where I "want" it connecting to).
I'm not entirely sure I understand this question, but Calico wants to connect to an IP address or domain name that can successfully reach the API server without relying on cluster networking being up (i.e., cluster DNS, cluster service implementation, etc), because Calico is responsible for bootstrapping all of those things.
I apologise in advance if I come across as unappreciative or gruff in any way, past, present, or future. I'm both frustrated, and excited to get this sorted.
Not at all :)
Also... considering my circumstance... should I be enabling eBPF before (or in-parallel with) declaring the KUBERNETES_SERVICE_HOST and _PORT? (And what about disabling kube-proxy? the test above kube-proxy was running at the time)
I would configure the env vars, and disable kube-proxy prior to launching Calico altogether.
To the best of my knowledge, you want something like one of the below options:
I don't think this is the "permanent" way I want this to run, but yeah pretty sure this falls in-line with what's been presented above. Yikes!
Glad you got something working!
I think the main takeaway here is using KUBERNETES_SERVICE_HOST=<real IP of control-plane node>
works because your nodes can reach that address even before Calico is installed, and it works for calico-kube-controllers because it's a "real" IP in the network and not a resolution that varies based on where it is being checked.
And frankly, this should be fine - the only downside really is that you're not getting the full benefit of running multiple API servers (i.e., if you need to do a rolling update, there won't be failover and Calico will just wait for the node you specified to come back online).
You probably don't need to go through the intermediate steps of using localhost
and restarting things, though.
Would love to have better documentation on this! Especially if kube-vip is a viable option, documenting the steps you took to integrate with that would be awesome.
Would love to have better documentation on this! Especially if kube-vip is a viable option, documenting the steps you took to integrate with that would be awesome.
I already am aspiring to do my best on documenting such things! Once I figure out this last bit I am probably going to do a yuge publication (or multiples?) on my own personal bloggy/articley site to get eCred hehe. But I do suspect that the Calico docs (and Rancher/RKE2 docs?) would probably benefit from such insights too :D Yay!
Oh and still feel free to correct me on any inaccuracies/misunderstandings/other details in general. Still lots to learn about k8s ;D
Expected Behavior
I believe the expected behaviour is that the calico-kube-controllers (single pod but that's the name it has) should be able to contact the Kubernetes API Server in the same way that calico-node pods can.
Current Behavior
calico-node and calico-typha pods can connect to the kubernetes API via localhost:6443 successfully, but calico-kube-controllers (the name of a singular pod in this case) gets connection refused for reasons not yet clear.
Possible Solution
I am not sure what the solution is just yet as I have been following the documentation
Steps to Reproduce (for bugs)
Following documentation here: https://docs.tigera.io/calico/3.27/operations/ebpf/enabling-ebpf#configure-calico-to-talk-directly-to-the-api-server And using the Calico provisioned by Rancher when creating the cluster, self-hosted, RKE2, on Ubuntu VMs, no public cloud present at all.
Context
So when I look at the YAML for calico-kube-controllers it is given the same environment variables KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT that the calico-node & calico-typha pods are given. The calico-kube-controllers pod has an IP of 10.42.790.200 (and changes with re-init of course). And I am using all "default" CIDR configurations that RKE2 delivers. So no CIDR customsiations have been made by me. UFW and AppArmor are turned off on the Ubuntu hosts.
When I instead use the VIP of 10.43.0.1 for KUBERNETES_SERVICE_HOST this prevents a rebooted k8s node from fully recovering as nothing can actually talk to the cluster. Namely calico-node pods cannot reach that VIP after a node reboot, so nothing comes up. I was only able to correct this when using "localhost" or "127.0.0.1" (except localhost might be preferable due to contextual agility when used).
Your Environment
==== The error that calico-kube-controller spits out is:
[INFO][1] main.go 131: Ensuring Calico datastore is initialized 2024-08-16T16:09:33.924977314Z 2024-08-16 16:09:33.924 [ERROR][1] client.go 295: Error getting cluster information config ClusterInformation="default" error=Get "https://localhost:6443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp [::1]:6443: connect: connection refused 2024-08-16 16:09:33.924 [INFO][1] main.go 138: Failed to initialize datastore error=Get "https://localhost:6443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp [::1]:6443: connect: connection refused 2024-08-16T16:09:38.935878325Z 2024-08-16 16:09:38.934 [ERROR][1] client.go 295: Error getting cluster information config ClusterInformation="default" error=Get "https://localhost:6443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp [::1]:6443: connect: connection refused
==== I am unsure what I should be doing here as I have followed the documentation, and have not found any relevant resources online on what I can do about this. So I really would appreciate help on this matter.