Closed ieugen closed 5 years ago
Use weave with encryption enabled?
Thanks @woopstar but https://www.wireguard.com/performance/ . Plus I think we need to easily add other machines as part of the VPN.
Have you tried to set up wireguard on all nodes first.
Then use access_ip in the inventory and use the wg0 ip in that var?
That would cause all intercommunication to go over the wg0 interface
@ieugen how's progress so far? tried a similar deployment once but stuck due of tcp in udp tunneling issues
I set up a cluster over Wireguard. I configured calico_mtu: 1400
and calico_ipv4pool_ipip: Always
to fix intra-cluster networking.
@thche I'm currently busy with some work and hope to get to this in about 2 weeks. I also did set the MTU to 1400 as @jcassee . Don't remember changing calico_ipv4pool_ipip
will have to check.
@thche I'm currently busy with some work and hope to get to this in about 2 weeks. I also did set the MTU to 1400 as @jcassee . Don't remember changing calico_ipv4pool_ipip
will have to check.
I recently set up a cluster on 3 Hetzner cloud machines.
I used Terraform to provision the VM's, and install Wireguard.
I did modify:
calico_mtu: 1400
and calico_ipv4pool_ipip: Always
.
Intra-cluster networking seems fine with these settings, though I think I have an issue with traffic between pods or to services using eth0 instead of wg0.
For instance, I tried deploying rook, which is creating the operator and monitor pods:
rook-ceph-system rook-ceph-operator-76cf7f88f-xdgwb 1/1 Running 0 1d 10.233.71.6 node3
rook-ceph rook-ceph-mon-a-7f64984887-v77qp 1/1 Running 0 32m 10.233.75.27 node2
rook-ceph rook-ceph-mon-b-7cdccc8f9b-h4s9f 1/1 Running 0 32m 10.233.71.22 node3
rook-ceph rook-ceph-mon-c-7bff57ddf5-cfs8g 1/1 Running 0 32m 10.233.75.28 node2
Services created:
rook-ceph rook-ceph-mon-a ClusterIP 10.233.18.215 <none> 6789/TCP 34m
rook-ceph rook-ceph-mon-b ClusterIP 10.233.55.91 <none> 6789/TCP 34m
rook-ceph rook-ceph-mon-c ClusterIP 10.233.29.239 <none> 6789/TCP 33m
The operator correctly sees the monitors:
2019-02-05 21:12:24.982473 I | op-mon: mon a running at 10.233.18.215:6789
2019-02-05 21:12:25.135325 I | op-mon: mon b running at 10.233.55.91:6789
2019-02-05 21:12:25.519774 I | op-mon: mon c running at 10.233.29.239:6789
But because of the firewall on eth0, it cannot establish communication on port 6789.
As soon as I allow this port on eth0, everything runs fine.
I tried to add routes, but it can't get it to work:
ip route add 10.233.0.0/18 dev wg0 src 10.0.1.3
ip route add 10.233.64.0/18 dev wg0 src 10.0.1.3
Any ideas ?
Sorry for the hijack, apparently it's due to a bug in the ceph mon container: https://github.com/ceph/ceph-container/issues/706
On another note, I'll probably release my Terraform / Kubespray configuration, and maybe write a post or something on the whole experience.
EDIT: my issue was due to ufw, I had a rule allowing all trafic on wg0, but trafic was blocked from calico virtual interfaces, as show in /var/log/ufw.log:
[540847.656801] [UFW BLOCK] IN=cali927d0380d13 OUT= MAC=ee:ee:ee:ee:ee:ee:9a:39:f0:75:e0:12:08:00 SRC=10.233.71.26 DST=10.233.6.47 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=9749 DF PROTO=TCP SPT=35984 DPT=6789 WINDOW=27200 RES=0x00 SYN URGP=0 MARK=0x75140000
I added two rules:
ufw allow from 10.233.64.0/18 # Allow communication on k8s pods network
ufw allow from 10.233.0.0/18 # Allow communication on k8s services network
We also have a setup with wireguard
installed on hetzner cloud. We use cilium
as CNI. It seems to work so far without any special configurations (further testing required).
Although we have one apparent issue that k8s certificates are issued only for VPN IP's, forcing an insecure tls login via kubectl
(insecure-skip-tls-verify
).
Would be nice to have it as an addon via kubespray
.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
@jcassee Are you working on this?
@woopstar Well, setting calico_mtu: 1400
and calico_ipv4pool_ipip: Always
allows Kubernetes to work over Wireguard. I'm not working on a plugin to install Wireguard using kubespray, if that is what you mean.
I think I removed the stale label from the wrong issue here. Maybe this issue should be closed, @ieugen?
We also dropped wireguard because cilium can now communicate encrypted.
@fentas Interesting! Setup through kubespray?
Kind of. We use cilium
as CNI. But kubespray installs 1.3
. So, as the host setup did not change from 1.3
to 1.5
, we just force apply the current cilium version 1.5
with encryption enabled after the kubespray provisioning.
I think we can close it soon.
@fentas: It is interesting to know:
At first glance, it certainly has some advantages IMO.
1) It's quite easy to setup. Basaclly there are two steps to enable it. \ https://docs.cilium.io/en/v1.5/gettingstarted/encryption/
ipsec
secret (you can choose from any of the supported Linux algorithms)ipsec
secret ref.
2) We had no audit in-house, yet. But I guess somewhere out there did at least the cilium community.PR for 1.5
: https://github.com/kubernetes-sigs/kubespray/pull/4714
It seems we have some options now and I think we can close this issue.
Thanks for all the feedback.
I also deploy on top of hetzner, and use wireguard.
I currently use this playbook.
I think that on cloud provider like hetzner, it is important to deploy on top of wireguard. In my setup, I actually have 3 different wireguard:
I'm also in contact with githubixx maybe it would make sense to move his role in kubespray?
I'll also work to add wireguard as a possible option for flannel.
Basically here my question is, do you think it makes sense to add a wireguard role to deploy kube components on top? Or should we keep it separate? I'd be happy with both options, just trying to figure out what would be the best for the community :)
@pierreozoux :
On Hetzener they did added private network support. It's not as secure as a VPN but it might be god enough with some k8s features.
There are several things to take into consideration regarding wireguard. I have some mixed feeling about it after using it successfully close to a year now.
I think it would be nice to have wireguard in kubespreay as a technology preview and keep it that way until wireguard makes it in the kernel. After that, once it reaches stable distributions it should probably be a feature. This however might take some time since it must reach a kernel and then that kernel must reach major stable distributions (CentOS, Debian, Ubuntu, SUSE etc). This is a lengthy process. Kernel backports and such might work to speed things up but you end up pulling packages from outside.
Having Wireguard in your setup does make the solution more complex with one extra layer to manage and one extra layer that can break. And an important layer at that - the network. Debugging network issues is never fun. Not having to depend on a VPN would make k8s deployments much better. However I think it can't be helped in some situations and for those it would be nice to have an easy setup.
"On Hetzener they did added private network support. It's not as secure as a VPN but it might be god enough with some k8s features."
Does one need to encrypt traffic on the vswitch private network in Hetzner? I could not get to a definitive answer after a lot of reading. E.g. see https://archive.is/oxnPG (search for vswitch)
Is this a BUG REPORT or FEATURE REQUEST? (choose one): FEATURE REQUEST / Question ?
Hi,
I would like to start a discussion around deploying a k8s cluster on top of a VPN - specifically wireguard since it is easy to setup and very performant.
This feature would make it easier / possible to deploy k8s inside infrastructure where there there is not a private network and other nodes are not trusted.
One example of cloud provider that does not currently provide private network is Hetzner, but it's not the only one.
The steps are more or less:
I deployed a k8s cluster with kubeadm and it worked fine but I did had some issues with CNI plugins:
I am looking for some feedback and will try the setup myself. Would love to see if there is interest in supporting this upstream.
Regards, Eugen