k3s flannel ipsec backend no connectivity between pods on different nodes

txomon commented 4 years ago

Version: k3s version v1.17.3+k3s1 (5b17a175)

K3s arguments: --flannel-backend ipsec

Describe the bug

There is no connectivity between the pods within different nodes.

To Reproduce Start k3s with k3s server --flannel-backend ipsec

Expected behavior I would expect to have connectivity between pods in different nodes

Actual behavior There is no traffic being routed in the cni0 interface

Additional context / logs

Oznup commented 4 years ago

Hello, just for curiosity, have you tried with the --docker flag ?

txomon commented 4 years ago

@Oznup I'm afraid not, I don't have docker available in the nodes. Is this something that is known to make it work?

Oznup commented 4 years ago

No. I don't think. But, have you tried this ? iptables -P FORWARD ACCEPT

Oznup commented 4 years ago

Hello, I have torn down and set up my whole cluster with ipsec and docker. And I can confirm, docker will not make it work because I have exactly the same problem with k3s version v1.17.4+k3s1. Have you found any clue to move forward ?

Oznup commented 4 years ago

Hello @txomon I've tried again on a fresh install with k3s v1.17.4+k3s1 and 2 nodes (1 master 1 minion), on multiple combinations :

docker and flannel with vxlan
docker and flannel with ipsec
with a database
without a database

Right after the install, I've tried to :

nslookup a name with core-dns
curl the metric service

Here are my nodes :

root@k3s-master-1:/home/superadmin# kubectl get nodes
NAME           STATUS   ROLES    AGE   VERSION
k3s-master-1   Ready    master   14m   v1.17.4+k3s1
k3s-minion-1   Ready    <none>   12m   v1.17.4+k3s1

And here are my pods on kube-system's namespace :

root@k3s-master-1:/home/superadmin# kubectl get pods -n kube-system -o wide
NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE   IP          NODE           NOMINATED NODE   READINESS GATES
kube-system   local-path-provisioner-58fb86bdfd-2n9l5   1/1     Running   0          13m   10.42.0.4   k3s-master-1   <none>           <none>
kube-system   metrics-server-6d684c7b5-gxm2w            1/1     Running   0          13m   10.42.0.2   k3s-master-1   <none>           <none>
kube-system   coredns-6c6bb68b64-8xkzc                  1/1     Running   0          13m   10.42.0.3   k3s-master-1   <none>           <none>

As you can see, all the pods are on k3s-master-1. So, from k3s-master-1 :

root@k3s-master-1:/home/superadmin#  nslookup kube-dns.kube-system.svc.cluster.local. 10.42.0.3
Server:         10.42.0.3
Address:        10.42.0.3#53

Name:   kube-dns.kube-system.svc.cluster.local
Address: 10.43.0.10

root@k3s-master-1:/home/superadmin# curl -k https://10.43.240.224
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
}

And from k3s-minion-1 :

root@k3s-minion-1:/home/superadmin# nslookup kube-dns.kube-system.svc.cluster.local. 10.42.0.3
;; connection timed out; no servers could be reached

root@k3s-minion-1:/home/superadmin# curl -k https://10.43.240.224
curl: (7) Failed to connect to 10.43.240.224 port 443: Connexion terminée par expiration du délai d'attente

And the result is the same whatever the combination. Does it look like your symptoms ?

Do you use K3s on bare metal or on VM ?

txomon commented 4 years ago

I haven't tested that extensively, but my tests were mainly on capturing traffic with tcpdump, and there was no sign of the traffic arriving to the node. Rather than solving the issue, I would be happy to know how to output debug logs for flannel, as I haven't been able to see any configuration around it.

I'm using the 5$ VPS in DO.

I hope I will have some time to look into the code deeper and maybe expose some other info from the flannel code.

Oznup commented 4 years ago

Ok, I don't know GoLang yet, so I'll not be able for the moment to explore the flannel's code.

However, I have tried something else : set up a fresh install with Kubernetes (the real one, the big one !), and I've made a very weird observation.

My nodes are on VM hosted at home on XCP-ng (Xen Cloud Platform, the more the time passes by, the more I feel being the only one to use it on the k8s community...)

So, I create 2 new VM, I set them up just before the K8s installation, and I create a snapshot. I install K8s and flannel (with vxlan, but it doesn't matter). I do the same tests as before and.... EVERYTHING WORKS MAGICALLY !!!!!!

Then, I feel an infinite happiness and prepare the champain bottle. But before, I decide to reinstall it properly, I revert my 2 VM to their snapshots, and I reinstall k8s. I do the same tests and... SAME BEHAVIOUR AS BEFORE, BACK TO THE START CASE !!!!! The champain bottle is still in the fridge...

BUT !!!!

The symptoms on my previous k3s cluster appeared after having used VM snapshots. The fact I get exactly the same behaviour with other components means something happens when we revert VM to their backup. I don't know what, it's the same network cards / configuration / mac addresses. Do you have installed k3s after reverting your VPS to snapshots or is it a 100% clean install ?

txomon commented 4 years ago

it's a vanilla 100% install, this was the first thing I saw after installing k3s

plockaby commented 4 years ago

I wanted to +1 this. I set up a plain vanilla Debian Buster install running in VMWare Fusion. After the installation I set up iptables-legacy. The hosts both started with no firewall, INPUT/OUTPUT/FORWARD all set to ACCEPT. And then ran this command:

curl -sfL https://get.k3s.io | sh -s - --flannel-backend=ipsec

I added a second node with this command:

curl -sfL https://get.k3s.io | K3S_URL=https://test01.XXX.XXX.XXX:6443 K3S_TOKEN=XXX sh -

I installed a simple web service pod as a service using a NodePort:

NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE    SELECTOR
kubernetes   ClusterIP   10.43.0.1       <none>        443/TCP          112m   <none>
whoami-api   NodePort    10.43.195.156   <none>        5000:32185/TCP   110m   service=whoami,type=api

All of the pods got scheduled on to test02:

NAME                          READY   STATUS    RESTARTS   AGE    IP          NODE     NOMINATED NODE   READINESS GATES
whoami-api-7979dd4cb6-9l5st   1/1     Running   0          111m   10.42.1.2   test02   <none>           <none>
whoami-api-7979dd4cb6-46p8n   1/1     Running   0          111m   10.42.1.3   test02   <none>           <none>
whoami-api-7979dd4cb6-sd4fx   1/1     Running   0          111m   10.42.1.4   test02   <none>           <none>

I can access the pods just fine with curl from test02 but trying to access them on test01 returns "Connection refused":

root@test01:~# time curl -L http://localhost:32185/
curl: (7) Failed to connect to localhost port 32185: Connection timed out

real    0m31.573s
user    0m0.007s
sys 0m0.003s

At the time of the error nothing appears in syslog. There are a lot of logs in syslog from when the cluster started and when the second host was added to the cluster and it's difficult to pick out what is important. But this is on the primary node:

Jun 20 19:20:03 test01 k3s[752]: 09[CFG] loaded IKE shared key for: '10.60.2.101'
Jun 20 19:20:03 test01 k3s[752]: 08[KNL] interface cni0 activated
Jun 20 19:20:03 test01 k3s[752]: 07[KNL] interface veth2b66b407 activated
Jun 20 19:20:03 test01 k3s[752]: 10[KNL] interface veth84dfb6cf activated
Jun 20 19:20:03 test01 k3s[752]: 11[KNL] interface veth0d191e24 activated
Jun 20 19:20:03 test01 k3s[752]: 09[KNL] 10.42.0.1 appeared on cni0
Jun 20 19:20:03 test01 k3s[752]: 07[KNL] fe80::6492:18ff:fe45:708e appeared on veth0d191e24
Jun 20 19:20:03 test01 k3s[752]: 13[KNL] fe80::30d4:9bff:fe06:5cb5 appeared on veth2b66b407
Jun 20 19:20:03 test01 k3s[752]: 07[KNL] fe80::90e9:55ff:feb4:58f appeared on cni0
Jun 20 19:20:03 test01 k3s[752]: 13[KNL] fe80::bc88:31ff:fedf:de07 appeared on veth84dfb6cf
Jun 20 19:20:03 test01 k3s[752]: 06[CFG] loaded IKE shared key for: '10.60.2.102'
Jun 20 19:20:03 test01 k3s[752]: 10[CFG] added vici connection: 10.60.2.101-10.42.0.0/24-10.42.1.0/24-10.60.2.102
Jun 20 19:20:03 test01 k3s[752]: 10[CFG] initiating '10.42.0.0/24-10.42.1.0/24'
Jun 20 19:20:03 test01 k3s[752]: 10[IKE] initiating IKE_SA 10.60.2.101-10.42.0.0/24-10.42.1.0/24-10.60.2.102[1] to 10.60.2.102
Jun 20 19:20:03 test01 k3s[752]: 10[ENC] generating IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ]
Jun 20 19:20:03 test01 k3s[752]: 10[NET] sending packet: from 10.60.2.101[500] to 10.60.2.102[500] (720 bytes)
Jun 20 19:20:03 test01 k3s[752]: 03[NET] received packet: from 10.60.2.102[500] to 10.60.2.101[500] (720 bytes)
Jun 20 19:20:03 test01 : 03[NET] received packet: from 10.60.2.102[500] to 10.60.2.101[500] (720 bytes)
Jun 20 19:20:03 test01 : 03[ENC] parsed IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ]
Jun 20 19:20:03 test01 : 03[IKE] 10.60.2.102 is initiating an IKE_SA
Jun 20 19:20:03 test01 : 03[CFG] selected proposal: IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_4096
Jun 20 19:20:03 test01 : 03[IKE] remote host is behind NAT
Jun 20 19:20:03 test01 : 03[ENC] generating IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(CHDLESS_SUP) N(MULT_AUTH) ]
Jun 20 19:20:03 test01 : 03[NET] sending packet: from 10.60.2.101[500] to 10.60.2.102[500] (728 bytes)
Jun 20 19:20:03 test01 : 02[NET] received packet: from 10.60.2.102[4500] to 10.60.2.101[4500] (256 bytes)
Jun 20 19:20:03 test01 : 02[ENC] parsed IKE_AUTH request 1 [ IDi AUTH SA TSi TSr N(MOBIKE_SUP) N(NO_ADD_ADDR) N(MULT_AUTH) N(EAP_ONLY) N(MSG_ID_SYN_SUP) ]
Jun 20 19:20:03 test01 : 02[CFG] looking for peer configs matching 10.60.2.101[%any]...10.60.2.102[10.60.2.102]
Jun 20 19:20:03 test01 k3s[752]: 03[ENC] parsed IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ]
Jun 20 19:20:03 test01 k3s[752]: 03[IKE] 10.60.2.102 is initiating an IKE_SA
Jun 20 19:20:03 test01 k3s[752]: 03[CFG] selected proposal: IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_4096
Jun 20 19:20:03 test01 k3s[752]: 03[IKE] remote host is behind NAT
Jun 20 19:20:03 test01 k3s[752]: 03[ENC] generating IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(CHDLESS_SUP) N(MULT_AUTH) ]
Jun 20 19:20:03 test01 k3s[752]: 03[NET] sending packet: from 10.60.2.101[500] to 10.60.2.102[500] (728 bytes)
Jun 20 19:20:03 test01 k3s[752]: 02[NET] received packet: from 10.60.2.102[4500] to 10.60.2.101[4500] (256 bytes)
Jun 20 19:20:03 test01 k3s[752]: 02[ENC] parsed IKE_AUTH request 1 [ IDi AUTH SA TSi TSr N(MOBIKE_SUP) N(NO_ADD_ADDR) N(MULT_AUTH) N(EAP_ONLY) N(MSG_ID_SYN_SUP) ]
Jun 20 19:20:03 test01 k3s[752]: 02[CFG] looking for peer configs matching 10.60.2.101[%any]...10.60.2.102[10.60.2.102]
Jun 20 19:20:03 test01 k3s[752]: 02[CFG] selected peer config '10.60.2.101-10.42.0.0/24-10.42.1.0/24-10.60.2.102'
Jun 20 19:20:03 test01 k3s[752]: 02[IKE] authentication of '10.60.2.102' with pre-shared key successful
Jun 20 19:20:04 test01 k3s[752]: 02[IKE] peer supports MOBIKE
Jun 20 19:20:04 test01 k3s[752]: 02[CFG] no IDr configured, fall back on IP address
Jun 20 19:20:04 test01 : 02[CFG] selected peer config '10.60.2.101-10.42.0.0/24-10.42.1.0/24-10.60.2.102'
Jun 20 19:20:04 test01 : 02[IKE] authentication of '10.60.2.102' with pre-shared key successful
Jun 20 19:20:04 test01 : 02[IKE] peer supports MOBIKE
Jun 20 19:20:04 test01 : 02[CFG] no IDr configured, fall back on IP address
Jun 20 19:20:04 test01 : 02[IKE] authentication of '10.60.2.101' (myself) with pre-shared key
Jun 20 19:20:04 test01 : 02[IKE] IKE_SA 10.60.2.101-10.42.0.0/24-10.42.1.0/24-10.60.2.102[2] established between 10.60.2.101[10.60.2.101]...10.60.2.102[10.60.2.102]
Jun 20 19:20:04 test01 : 02[CFG] selected proposal: ESP:AES_GCM_16_128/NO_EXT_SEQ
Jun 20 19:20:04 test01 kernel: [  524.580396] alg: No test for seqiv(rfc4106(gcm(aes))) (seqiv(rfc4106-gcm-aesni))
Jun 20 19:20:04 test01 kernel: [  524.584106] alg: No test for fips(ansi_cprng) (fips_ansi_cprng)
Jun 20 19:20:04 test01 : 02[IKE] CHILD_SA 10.42.0.0/24-10.42.1.0/24{1} established with SPIs c71758a6_i c94b8c16_o and TS 10.42.0.0/24 === 10.42.1.0/24
Jun 20 19:20:04 test01 : 02[ENC] generating IKE_AUTH response 1 [ IDr AUTH SA TSi TSr N(MOBIKE_SUP) N(ADD_4_ADDR) ]
Jun 20 19:20:04 test01 : 02[NET] sending packet: from 10.60.2.101[4500] to 10.60.2.102[4500] (224 bytes)
Jun 20 19:20:06 test01 : 01[IKE] retransmit 1 of request with message ID 0
Jun 20 19:20:06 test01 : 01[NET] sending packet: from 10.60.2.101[500] to 10.60.2.102[500] (720 bytes)
Jun 20 19:20:06 test01 : 08[NET] received packet: from 10.60.2.102[500] to 10.60.2.101[500] (728 bytes)
Jun 20 19:20:06 test01 : 08[ENC] parsed IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(CHDLESS_SUP) N(MULT_AUTH) ]
Jun 20 19:20:06 test01 : 08[CFG] selected proposal: IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_4096
Jun 20 19:20:07 test01 : 08[IKE] remote host is behind NAT

For what it's worth, it says that "remote host is behind NAT" and while these hosts are indeed behind a NAT, there is no NAT traversal happening between the two nodes. They are on the same L2 domain. Additionally, I set up the same cluster on two virtual machines running on KVM with only public IP addresses and the same "remote host is behind NAT" appeared. (Note: these hosts were connected directly to a network with publicly routable IP addresses. This was not on a cloud provider but in our data center.)

That said, it all works without issue if I use "vxlan" but for various reasons I'd like things to be encrypted on the wire. This may be a bug in flannel or a bug somewhere else outside of k3s but I guess I'd figure that it should work relatively out of the box given that it is advertised as an easy way to set up a cluster.

Any pointers or resolution would be very much appreciated.

plockaby commented 4 years ago

I did end up finding a solution for this in (the still unresolved issue) coreos/flannel#966. I took a look at ip route and saw this on my two nodes:

root@test01:~# ip route
default via 10.60.2.2 dev eth0 onlink
10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1
10.60.2.0/24 dev eth0 proto kernel scope link src 10.60.2.102

And on test02:

root@test02:~# ip route
default via 10.60.2.2 dev eth0 onlink
10.42.1.0/24 dev cni0 proto kernel scope link src 10.42.1.1
10.60.2.0/24 dev eth0 proto kernel scope link src 10.60.2.102

I ran these commands on my two nodes:

root@test01:~# ip route del 10.42.0.0/24 dev cni0
root@test01:~# ip route add 10.42.0.0/16 dev cni0 proto kernel scope link src 10.42.0.1

And on test02:

root@test02:~# ip route del 10.42.1.0/24 dev cni0
root@test02:~# ip route add 10.42.0.0/16 dev cni0 proto kernel scope link src 10.42.1.1

The network that must be removed is specific to each host and is not predictable at cluster initialization. This solution also doesn't persist across reboots so I just need to figure out how to run these automatically when flannel starts. But all that said, I'm also not feeling super confident that this is a 100% fix, either, but at least it's a step forward. It'd still be nice to have someone who is familiar with flannel confirm that this is the solution and perhaps get it implemented but that person, sadly, is not me.

txomon commented 4 years ago

So from what I can see, there is no logic for routing in the ipsec backend in flannel, as opposed to the rest of the backends.

Currently I'm trying to figure out where the /24 route is coming from.

txomon commented 4 years ago

Seems like the /24 route is automatically created because the PodCIDR network of the node is /24, and the network used for the cni0 interface is /24. I'm going to read more on whether the cni0 interface should be /24 or /16 before moving into implementing the routing in flannel.

txomon commented 4 years ago

After reading a bit more, routing happens through iptables policies, and not through ip route. Making the solution explained above work but not addressing the underlying error.

With the agents started with:

k3s agent -v 3 --node-external-ip $K3S_EXTERNAL_IP

Can anyone confirm that they get an error like the following:

error adding ipsec policy: error adding ipsec out policy: error adding policy: {Dst: 1.1.1.1/24, Src: 2.2.2.2/24, Proto: 0, DstPort: 0, SrcPort: 0, Dir: dir out, Priority: 0, Index: 0, Mark: <nil>, Tmpls: [{Dst: 3.3.3.3, Src: 4.4.4.4, Proto: esp, Mode: tunnel, Spi: 0x0, Reqid: 0xb}]} err: file exists

IPs and nodes being:

Local Node, PodCIDR 2.2.2.2/24, Public IP 4.4.4.4 Remote Node, PodCIDR 1.1.1.1/24, Public IP 3.3.3.3

txomon commented 4 years ago

I seem to have reached a fix, given that flannel is ignoring the errors on ipsec policies setup, I have adapted the code to ignore (log) errors when trying to set each of the policies.

txomon commented 4 years ago

Also, I have found that the xfrm policies fail to apply because they already exist, making this probably an issue that arises when restarting k3s without restarting the node.

txomon commented 4 years ago

I have submitted coreos/flannel#1338 to fix the problem. Tested locally

Oznup commented 4 years ago

Wow, great work ! I'll try it asap ! Thanks a lot for the fix, and the time / energy spent !

txomon commented 4 years ago

So, I have bad news, this is fixing pod-to-pod pod-to-service comms, but leaves out external-to-nodeport services that use externalTrafficPolicy=Cluster. The IP route above (for some reason) solves the routing problem when the server needs to do NAT.

I'm still digging to try to find out what exactly is going on though. I believe there is something wrong when there is a routing decision to be done on a NAT-ed package towards a service that is not residing within the same node.

plockaby commented 4 years ago

The ip route solution that I proposed matches what the vxlan backend does soooo that would be one reason to pursue it.

txomon commented 4 years ago

@plockaby indeed, however routing in IPSec is not done through ip route but through ip xfrm policy. We would be applying the wrong solution and leaving the underlying problem unattended

woa7 commented 4 years ago

@txomon witch k3s version is a working? so i can test and make shure my DNS and IP host setup is correct. (im trying it on Ubuntu 20.04.1 LTS, with public ipv4 and ipv6 adress, on a cloud VM at hetzner)

txomon commented 4 years ago

So after a long talk with Thermi in #strongswan in freenode, we have managed to find that there is a missing SNAT iptables rules that would allow for externalTrafficPolicy=Cluster to work properly. There is also the problem that the current setup is manually inserting ip xfrm policies instead of using the VICI interface to set them up.

@woa7 if you want to test, you can manually build https://github.com/txomon/k3s/tree/xfrm-policy-error, which should make work inter-pod connectivity

brandond commented 4 years ago

FYI: PRs automatically kick off CI builds that can be installed using the install.sh INSTALL_K3S_COMMIT environment variable.

woa7 commented 4 years ago

@txomon I hope I can give it a try

@brandond thanks for the info.

mamiu commented 4 years ago

@brandond Is there a reason why the PR from @txomon still isn't merged yet?

The problem I can't use the INSTALL_K3S_COMMIT env variable is, because I already have to use it for this issue (which we talked about yesterday).

mamiu commented 4 years ago

@brandond Sorry, totally missed the point that @txomon's PR was just to get a temporary fixed build and not intended to be merged.

Since @txomon's fix is now released with coreos/flannel v0.13.0 the rancher/flannel repo can be updated to the latest release. @brandond: You mentioned @Oats87 in the PR. Is he the one who can do it?

Edit:

Maybe I should mention that I get exactly the same issue as described in this conversation if DON'T provide the --flannel-backend ipsec argument in comparison to all the others who get this error only when they provide this argument. Interestingly everything is working fine as soon as I add --flannel-backend ipsec to my cluster initialization command.

Not working

Node1:

curl -sfL https://get.k3s.io | bash -s - server \
  --cluster-init \
  --write-kubeconfig-mode "0644" \
  --disable=traefik,servicelb,local-storage,metrics-server \
  --kube-apiserver-arg="service-node-port-range=80-32767"

Node2:

curl -sfL https://get.k3s.io | bash -s - server \
  --server https://my-server.com:6443 \
  --token "K105439d7c65170f9f7fc5d1de3958524a834af96767d70..." \
  --write-kubeconfig-mode "0644" \
  --disable=traefik,servicelb,local-storage,metrics-server \
  --kube-apiserver-arg="service-node-port-range=80-32767"

Working

Node1:

curl -sfL https://get.k3s.io | bash -s - server \
  --cluster-init \
  --write-kubeconfig-mode "0644" \
  --disable=traefik,servicelb,local-storage,metrics-server \
  --kube-apiserver-arg="service-node-port-range=80-32767" \
  --flannel-backend ipsec

Node2:

curl -sfL https://get.k3s.io | bash -s - server \
  --server https://my-server.com:6443 \
  --token "K105439d7c65170f9f7fc5d1de3958524a834af96767d70..." \
  --write-kubeconfig-mode "0644" \
  --disable=traefik,servicelb,local-storage,metrics-server \
  --kube-apiserver-arg="service-node-port-range=80-32767" \
  --flannel-backend ipsec

txomon commented 3 years ago

To my knowledge this is still not fixed, I have upgraded to latest release and the error message is still there. @brandond do you have any insight on when the flannel pin will be updated?

https://github.com/k3s-io/k3s/blob/618b0f98bfdc0dfbfb5ee5edc19801f605871516/go.mod#L17

The fix has been in rancher's flannel fork for almost half a year already, but k3s hasn't been updated to include the fix.

Gerthum commented 3 years ago

> ## Edit:
> Maybe I should mention that I get exactly the same issue as described in this conversation if **DON'T** provide the `--flannel-backend ipsec` argument in comparison to all the others who get this error only when they provide this argument. Interestingly everything is working fine as soon as I add `--flannel-backend ipsec` to my cluster initialization command.

Getting the exact same issue with communication between nodes only working when Ipsec is enabled.

Version: v1.20.4+k3s1

@mamiu did you by any chance figure out why?

mamiu commented 3 years ago

@Gerthum Yes, I used Fedora as OS on all nodes which wasn't supported back then. I had a lot of strange issues which I couldn't resolve until I switched to Ubuntu (the recommended OS as of October 2020). With Ubuntu everything worked out of the box and still runs smoothly.

So my best advice is to check the installation requirements again.

fapatel1 commented 3 years ago

@manuelbuil would you be able to look at this issue?

manuelbuil commented 3 years ago

Hola @txomon. I'm new to this issue and I see it has several fronts. It'd be great if you could help me understand a bit better what's done and what's missing. Let me start with:

To my knowledge this is still not fixed, I have upgraded to latest release and the error message is still there. @brandond do you have any insight on when the flannel pin will be updated?

https://github.com/k3s-io/k3s/blob/618b0f98bfdc0dfbfb5ee5edc19801f605871516/go.mod#L17

The fix has been in rancher's flannel fork for almost half a year already, but k3s hasn't been updated to include the fix.

k3s v1.21.1+k3s1 is already pointing at upstream flannel (version flannel-io/flannel v0.13.1-rc2). If I understand correctly, your fix is already there, so I guess that problem is fixed, right?

Then, regarding:

So after a long talk with Thermi in #strongswan in freenode, we have managed to find that there is a missing SNAT iptables rules that would allow for externalTrafficPolicy=Cluster to work properly. There is also the problem that the current setup is manually inserting ip xfrm policies instead of using the VICI interface to set them up.

Is this a bug in flannel that still exists? Has there been a PR or an issue that addresses it? I'm also part of the flannel community so I could help

manuelbuil commented 3 years ago

@brandond Sorry, totally missed the point that @txomon's PR was just to get a temporary fixed build and not intended to be merged.

Since @txomon's fix is now released with coreos/flannel v0.13.0 the rancher/flannel repo can be updated to the latest release. @brandond: You mentioned @Oats87 in the PR. Is he the one who can do it?

Edit:

Maybe I should mention that I get exactly the same issue as described in this conversation if DON'T provide the --flannel-backend ipsec argument in comparison to all the others who get this error only when they provide this argument. Interestingly everything is working fine as soon as I add --flannel-backend ipsec to my cluster initialization command.

Not working

Node1:
curl -sfL https://get.k3s.io | bash -s - server \
  --cluster-init \
  --write-kubeconfig-mode "0644" \
  --disable=traefik,servicelb,local-storage,metrics-server \
  --kube-apiserver-arg="service-node-port-range=80-32767"
Node2:
curl -sfL https://get.k3s.io | bash -s - server \
  --server https://my-server.com:6443 \
  --token "K105439d7c65170f9f7fc5d1de3958524a834af96767d70..." \
  --write-kubeconfig-mode "0644" \
  --disable=traefik,servicelb,local-storage,metrics-server \
  --kube-apiserver-arg="service-node-port-range=80-32767"
Working

Node1:
curl -sfL https://get.k3s.io | bash -s - server \
  --cluster-init \
  --write-kubeconfig-mode "0644" \
  --disable=traefik,servicelb,local-storage,metrics-server \
  --kube-apiserver-arg="service-node-port-range=80-32767" \
  --flannel-backend ipsec
Node2:
curl -sfL https://get.k3s.io | bash -s - server \
  --server https://my-server.com:6443 \
  --token "K105439d7c65170f9f7fc5d1de3958524a834af96767d70..." \
  --write-kubeconfig-mode "0644" \
  --disable=traefik,servicelb,local-storage,metrics-server \
  --kube-apiserver-arg="service-node-port-range=80-32767" \
  --flannel-backend ipsec

@mamiu According to your description, the error you are seeing has nothing to do with ipsec, right? When deploying k3s with default flannel config, you can't ping between pods sitting in different nodes, right? If that's the case, could you please create a different issue. That will help reduce my confusion :). Thanks!

manuelbuil commented 3 years ago

@brandond Sorry, totally missed the point that @txomon's PR was just to get a temporary fixed build and not intended to be merged. Since @txomon's fix is now released with coreos/flannel v0.13.0 the rancher/flannel repo can be updated to the latest release. @brandond: You mentioned @Oats87 in the PR. Is he the one who can do it?

Edit:

Maybe I should mention that I get exactly the same issue as described in this conversation if DON'T provide the --flannel-backend ipsec argument in comparison to all the others who get this error only when they provide this argument. Interestingly everything is working fine as soon as I add --flannel-backend ipsec to my cluster initialization command.

Not working

Node1:
curl -sfL https://get.k3s.io | bash -s - server \
  --cluster-init \
  --write-kubeconfig-mode "0644" \
  --disable=traefik,servicelb,local-storage,metrics-server \
  --kube-apiserver-arg="service-node-port-range=80-32767"
Node2:
curl -sfL https://get.k3s.io | bash -s - server \
  --server https://my-server.com:6443 \
  --token "K105439d7c65170f9f7fc5d1de3958524a834af96767d70..." \
  --write-kubeconfig-mode "0644" \
  --disable=traefik,servicelb,local-storage,metrics-server \
  --kube-apiserver-arg="service-node-port-range=80-32767"
Working

Node1:
curl -sfL https://get.k3s.io | bash -s - server \
  --cluster-init \
  --write-kubeconfig-mode "0644" \
  --disable=traefik,servicelb,local-storage,metrics-server \
  --kube-apiserver-arg="service-node-port-range=80-32767" \
  --flannel-backend ipsec
Node2:
curl -sfL https://get.k3s.io | bash -s - server \
  --server https://my-server.com:6443 \
  --token "K105439d7c65170f9f7fc5d1de3958524a834af96767d70..." \
  --write-kubeconfig-mode "0644" \
  --disable=traefik,servicelb,local-storage,metrics-server \
  --kube-apiserver-arg="service-node-port-range=80-32767" \
  --flannel-backend ipsec
@mamiu According to your description, the error you are seeing has nothing to do with ipsec, right? When deploying k3s with default flannel config, you can't ping between pods sitting in different nodes, right? If that's the case, could you please create a different issue. That will help reduce my confusion :). Thanks!

I have just read your last comment, where you specify that the problem is solved. Great! The link you provided does not show the list of supported OSs though. Maybe you meant another link?

txomon commented 3 years ago

So after a long talk with Thermi in #strongswan in freenode, we have managed to find that there is a missing SNAT iptables rules that would allow for externalTrafficPolicy=Cluster to work properly. There is also the problem that the current setup is manually inserting ip xfrm policies instead of using the VICI interface to set them up.

Is this a bug in flannel that still exists? Has there been a PR or an issue that addresses it? I'm also part of the flannel community so I could help

I haven't reported it because I don't think I would be able to give a description at the level of expertise required to propose a solution. There is therefore no PR, but it's a bunch of work to be done, as we would need to rewrite the ipsec backend completely, add a few features to support encapsulating packets from the receiver node to the destination pod, and support the re-encapsulation of the reply.

However, said that, it did seem like all the features would be available if these measures are taken.

manuelbuil commented 3 years ago

So after a long talk with Thermi in #strongswan in freenode, we have managed to find that there is a missing SNAT iptables rules that would allow for externalTrafficPolicy=Cluster to work properly. There is also the problem that the current setup is manually inserting ip xfrm policies instead of using the VICI interface to set them up.

Is this a bug in flannel that still exists? Has there been a PR or an issue that addresses it? I'm also part of the flannel community so I could help

I haven't reported it because I don't think I would be able to give a description at the level of expertise required to propose a solution. There is therefore no PR, but it's a bunch of work to be done, as we would need to rewrite the ipsec backend completely, add a few features to support encapsulating packets from the receiver node to the destination pod, and support the re-encapsulation of the reply.

However, said that, it did seem like all the features would be available if these measures are taken.

Fair enough. Given that the problem you are facing now is different from the original problem this issue is describing, can you close this issue and open a new one, preferably in Flannel, describing the steps to reproduce it please?

txomon commented 3 years ago

I'm closing this issue as @manuelbuil pointed out has been fixed.

Regarding opening a new ticket upstream, I'm afraid that I cannot give it enough follow up at the moment so I would rather not open a ticket to leave it unattended.

mjrist commented 3 years ago

I am running into the issue with NodePort services using the Cluster traffic policy (with flannel ipsec) described by @txomon https://github.com/k3s-io/k3s/issues/1613#issuecomment-683622850

I am currently working around this by setting the traffic policies to local, but this is less than ideal and requires that I run the backend as a daemonset.

Anybody have anymore insight into what is going on here and what is required for a fix?

Should I open a new issue and reference this one? Or should this be logged elsewhere?

k3s-io / k3s