Closed namelessvoid closed 5 months ago
I'm not sure if cross datacenter traffic can be sent over the private IPs. I suppose that question should be directed at hetzner cloud themselves.
OK, I've tried to create VMs in different DCs and they are capable to communicate to each other over the private networking.
@namelessvoid can you please try to build kubeone using latest master and try it?
I'm getting different results
root@ubuntu:/# traceroute 10.244.7.2
traceroute to 10.244.7.2 (10.244.7.2), 30 hops max, 60 byte packets
1 static.123.164.55.162.clients.your-server.de (162.55.164.123) 0.156 ms 0.066 ms 0.073 ms
2 10.244.7.0 (10.244.7.0) 4.442 ms 4.234 ms 4.131 ms
3 10.244.7.2 (10.244.7.2) 4.282 ms 4.042 ms 3.844 ms
where 10.244.7.2
is overlay IP of the pod running on the other datacenter.
Maybe I'm getting it wrong but shouldn't the first hop be the virtual network IP of your node? 162.55.164.123
is the public IP, isn't it? Disclaimer: I'm not too deep into k8s networking 🙈
I'll try latest master as soon as I can (I'm a bit tied by releases right now).
Just for completeness, I tried a fresh cluster installed with kubeone 1.2.3 and see these results:
1 static.170.210.55.162.clients.your-server.de (162.55.210.170) 0.147 ms 0.033 ms 0.024 ms
2 172.31.1.1 (172.31.1.1) 13.458 ms 13.301 ms 13.008 ms
3 11685.your-cloud.host (195.201.67.143) 0.607 ms 0.478 ms 0.515 ms
Then I built kubeone from master and retried on another freshly installed cluster:
root@ubuntu:/# traceroute nginx.default.svc.cluster.local
traceroute to nginx.default.svc.cluster.local (10.103.180.121), 30 hops max, 60 byte packets
1 static.97.89.201.138.clients.your-server.de (138.201.89.97) 0.062 ms 0.027 ms 0.022 ms
2 172.31.1.1 (172.31.1.1) 14.458 ms 14.358 ms 14.322 ms
3 12740.your-cloud.host (136.243.181.165) 0.512 ms 0.447 ms 0.395 ms
kubeone version
for the self-built one shows
{
"kubeone": {
"major": "1",
"minor": "2",
"gitVersion": "v1.2.0-rc.0-65-gab496ef",
"gitCommit": "ab496efdaa222e92f14a1d0cbe63149d57f8cc53",
"gitTreeState": "",
"buildDate": "2021-06-22T11:48:09+02:00",
"goVersion": "go1.16.5",
"compiler": "gc",
"platform": "darwin/amd64"
},
"machine_controller": {
"major": "1",
"minor": "30",
"gitVersion": "v1.30.0",
"gitCommit": "",
"gitTreeState": "",
"buildDate": "",
"goVersion": "",
"compiler": "",
"platform": "linux/amd64"
}
}
Test setup:
$ kubectl run nginx --image nginx
$ kubectl expose pod nginx --port 80
$ kubectl run ubuntu --image ubuntu -- sleep infinity
$ kubectl exec -it ubuntu -- bash
# apt update && apt install traceroute -y
# traceroute nginx.default.svc.cluster.local
Did a third test by installing the cluster from the example terraform files.
Kubeone manifest looks like this:
apiVersion: kubeone.io/v1beta1
kind: KubeOneCluster
versions:
kubernetes: '1.20.6'
cloudProvider:
hetzner: {}
external: true
addons:
enable: true
path: "./addons"
For the test, ./addons
was empty.
I tried both, a cluster with a single worker node and a cluster with two worker nodes. The traceroute results remain the same, traffic is routed via public IPs.
I'm attaching some screens from the networking of the Hetzner Cloud Console. This should be setup correctly, shouldn't it?
I'm happy for any ideas for further debugging! Thank you a lot! :)
Ok, maybe I found something - sorry for not thinking about this earlier!
When I traceroute the pod IP as you did, @kron4eg, I also see the traffic using the overlay IP:
$ traceroute 10.244.8.36
traceroute to 10.244.8.36 (10.244.8.36), 30 hops max, 60 byte packets
1 static.XXX.XXX.XXX.162.clients.your-server.de (162.XXX.XXX.XXX) 0.132 ms 0.033 ms 0.021 ms
2 10.244.8.0 (10.244.8.0) 3.768 ms 3.641 ms 3.543 ms
3 10-244-8-36.nginx.default.svc.cluster.local (10.244.8.36) 3.622 ms 3.497 ms 3.490 ms
I'm still confused, though, why the public IP shows up in the trace.
But when accessing the service exposing the very same pod, it seems to take the public route again:
$ traceroute 10.109.255.202
traceroute to 10.109.255.202 (10.109.255.202), 30 hops max, 60 byte packets
1 static.XXX.XXX.XXX.162.clients.your-server.de (162.55.166.14) 0.080 ms 0.039 ms 0.022 ms
2 172.31.1.1 (172.31.1.1) 10.880 ms 9.905 ms 10.592 ms
3 11202.your-cloud.host (159.69.96.89) 0.447 ms 0.332 ms 0.320 ms
4 * * *
5 spine2.cloud2.fsn1.hetzner.com (213.239.225.45) 1.018 ms spine1.cloud2.fsn1.hetzner.com (213.239.225.41) 0.958 ms spine2.cloud2.fsn1.hetzner.com (213.239.225.45) 1.263 ms
6 core23.fsn1.hetzner.com (213.239.239.137) 13.665 ms 2.714 ms core24.fsn1.hetzner.com (213.239.239.129) 4.106 ms
7 core11.nbg1.hetzner.com (213.239.203.125) 7.735 ms core12.nbg1.hetzner.com (213.239.203.121) 10.383 ms core11.nbg1.hetzner.com (213.239.203.125) 16.566 ms
...
So maybe some setting for the service overlay is not correct?
@kron4eg Could you maybe retry this on your end to confirm this? Thank you a lot!
I'll try to reproduce
@namelessvoid I still can't replicate that behaviour (using master build). Could you please attach your manifests (workloads/services/etc)?
@kron4eg Sorry for the late response, got some stuff in may way in between...
There is nothing special, I believe:
apiVersion: v1
kind: Pod
metadata:
labels:
run: nginx
name: nginx
namespace: default
spec:
containers:
- image: nginx
name: nginx
---
apiVersion: v1
kind: Service
metadata:
labels:
run: nginx
name: nginx
namespace: default
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
run: nginx
type: ClusterIP
I can confirm this issue.
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
t1-control-plane-1 Ready control-plane,master 80m v1.21.3 10.8.0.2 188.34.X.X Ubuntu 20.04.2 LTS 5.4.0-77-generic containerd://1.4.8
t1-pool1-54f9cd8694-drz4m Ready <none> 77m v1.21.3 10.8.0.3 162.55.X.X Ubuntu 20.04.2 LTS 5.4.0-77-generic containerd://1.4.8
Testing with the manifests @namelessvoid provided in their last post:
root@ubuntu:/# traceroute 10.244.1.2
traceroute to 10.244.1.2 (10.244.1.2), 30 hops max, 60 byte packets
1 static.103.165.55.162.clients.your-server.de (162.55.X.X) 0.100 ms 0.030 ms 0.065 ms
2 10-244-1-2.nginx.default.svc.cluster.local (10.244.1.2) 0.223 ms 0.063 ms 0.069 ms
The first hop (162.55.X.X) is the external IP of the node. That should be 10.8.0.3 instead.
EDIT: OK, I suppose it was a false alarm. Pods keep talking to each other even though I'm now blocking all external traffic to the nodes. I'm still confused that the external IP shows up in the traceroute, though.
The first hop (162.55.X.X) is the external IP of the node
Is own IP of the node. This IP is the default route for pods.
Can we somehow configure the internal IP to be the node's IP? Yesterday I said
Pods keep talking to each other even though I'm now blocking all external traffic to the nodes.
but that is only true if I use the SDN firewall provided by Hetzner. When I use iptables
on the nodes to block all incoming traffic via the interface eth0
, the pods can't communicate anymore.
I'd actually like to be able to disable the public interface completely. Is that somehow feasible with kubeone?
@Lykos153 I support it can be achieved by using custom images.
I can now say for sure that DNS traffic is still routed via the public interface. With all incoming public connections blocked, pods can reach each other via IP but not via service hostnames. Also, every request from pods to the internet has a ~5s delay due to DNS timeout. The cluster is not usable unless I open ports 9*53
on the public network. I'm gonna try to get rid of the public interface using a custom image as you suggested. The issue remains, however.
Any update here? We have the same issue.
We need to whitelist the public ip-ranges as trusted ip's is our ingress to make the proxy protocol to work.
Same issue, makes firewalling horrible. Have manually patched kubeconfigs to use private IP... Maybe can override kubeadm args somewhere
@alam0rt did it helped?
@alam0rt did it helped?
It helps, but it gets overridden on upgrade as the kubeadm config is regenerated.
For the time being I am just adding the public IPs to the rules using
data "hcloud_servers" "nodes" {
with_selector = "role=node"
}
locals {
node_public_ipv4 = [for node in data.hcloud_servers.nodes.servers : join("/", [node.ipv4_address, "32"])]
}
The admin kubeconfig is generated using value from terraform output kubeone_api
. By default this value is public IP of the kubeapi loadbalancer. I don't see if hcloud_load_balancer
can give you internal IP.
output "kubeone_api" {
description = "kube-apiserver LB endpoint"
value = {
endpoint = hcloud_load_balancer.load_balancer.ipv4
apiserver_alternative_names = var.apiserver_alternative_names
}
}
The admin kubeconfig is generated using value from terraform output
kubeone_api
. By default this value is public IP of the kubeapi loadbalancer. I don't see ifhcloud_load_balancer
can give you internal IP.output "kubeone_api" { description = "kube-apiserver LB endpoint" value = { endpoint = hcloud_load_balancer.load_balancer.ipv4 apiserver_alternative_names = var.apiserver_alternative_names } }
There definitely is a private IP that can be used. I'll give it a go soon and see what happens.
So, it looks like you can use
value = {
endpoint = hcloud_load_balancer.load_balancer.network_ip
}
}
network_ip
is defined here: https://github.com/hetznercloud/terraform-provider-hcloud/blob/d6f4207b2b75b76e007bd08602e6dcbfb1740032/internal/loadbalancer/resource.go#L406
but is apparently undocumented!
OK, having the INTERNAL IP as kube-api endpoint means that kubeconfigs for whole system will contain that IP. Including admin config. Kubeone will work around that, not issue (we always tunnel kube-apiserver requests via ssh).
However your local kubectl
might have a problem, but worry not kubeone proxy
to the rescue! kubeone proxy
will create a pass through ssh-tunnel proxy, that kubectl
can easily leverage with export HTTPS_PROXY=http://...
.
Speaking of which, is there a good way to regenerate all of the kubeconfig
s ? I have updated the terraform output and ran kubeone apply --manifest kubeone.yaml -t new.json
but I don't think anything is updated. Maybe I need to force upgrade?
No, I don't think so it's possible, at least no under kubeadm. You'd need to create a new cluster.
Damn! New cluster it is I guess.
I mean, it can be done manually, but it's highly possible to kill your cluster. But if you'd like to try, here's how:
But I highly recommend not doing this in the cluster that has anything valuable running under it.
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale Docs are still pending.
Question: is the network CNI configured and deployed before Hetzner CCM or after? I'm not sure yet what actually happens, but per their instructions here https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/main/docs/deploy_with_networks.md I believe that supported CNI will be handled by the CCM to ensure communication will be done throught the private interface.
the Hetzner ccm manifest in addons is without network addon, and will not try to make pods use the private network.
@madalinignisca The CNI is deployed before the CCM. We'll give this a try, but in the meanwhile, I recommend checking out Cilium if that works for you. Some folks reported more success with Cilium (e.g. https://github.com/kubermatic/kubeone/issues/2219).
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale /lifecycle frozen
@madalinignisca The CNI is deployed before the CCM. We'll give this a try, but in the meanwhile, I recommend checking out Cilium if that works for you. Some folks reported more success with Cilium (e.g. #2219).
I managed to do a core manual setup with kubeadm and managed to get all things in my idea. I would love if I get time to try to get it with kubeone. Yes, Cillium involved and as I had to dive deep into it, I think I'm never looking at other CNI.
This issue should be fixed as of KubeOne 1.7 at least. I'm going to close it, but if you still have the issue, please let us know. /close
@xmudrii: Closing this issue.
What happened:
I have a kubeone cluster set-up at Hetzner via the example terraform scripts which include a private network. The only change we have is to add worker pools for a list of datacenters:
The resulting nodes look like this:
When I
traceroute
a kubernetes service (e.g.backend.default.svc.cluster.local
) I see that the traffic is routed via the public IP of the nodes instead of the IP within the private network:Where
162.55.XXX.XXX
is the public IP of the node. I'd expect the traffic being sent to192.168.0.7
instead. I confirmed on a GKE cluster and there it seems that traffic is routed via the private IPs.As a consequence, if I apply a firewall which prevents access to the nodes' public IPs, the cluster networking becomes non-operational in a sense that DNS lookups no longer work and services cannot be reached.
What is the expected behavior:
In-cluster traffic should be routed via private IPs and not via public IPs. I should also be able to restrict public node IP access via firewall and the cluster should stay operational.
How to reproduce the issue:
I did not try it with a fresh install, but steps to reproduce should be:
Anything else we need to know?
Information about the environment: KubeOne version (
kubeone version
): Cluster was created with kubeone 1.2.1 but was updated to 1.2.2 and then 1.2.3 recently. MachineDeployments have been restarted via https://docs.kubermatic.com/kubeone/master/cheat_sheets/rollout_machinedeployment/ Operating system: Ubuntu 20.04.2 LTS Provider you're deploying cluster on: Hetzner Operating system you're deploying on: MacOSHope you can help me with that! Thank you a lot!