Open IceManGreen opened 6 months ago
@IceManGreen we would need some additional info to narrow down the problem. Can you clarify the following.
subctl verify --context <kubeContext1> --tocontext <kubeContext2> --only connectivity --verbose
. This will help us to know if the connectivity is working between the Gateway nodes but failing when client/server is on a non-Gateway node. Note: Since subctl show connections
is looking good, I think Gateway to Gateway communication is probably working in your setup, but when one of them is on non-Gateway, its failing.subctl gather
from domain-2 and domain-3 clusters.@sridhargaddam hello ! Thanks for your answer.
What is the CNI that is used in your clusters?
I use K3S with Flannel :
$ /var/lib/rancher/k3s/data/current/bin/flannel
CNI Plugin flannel version v0.22.2 (linux/amd64) commit HEAD built on 2024-02-06T01:58:54Z
What is the Submariner version?
$ subctl show versions
Cluster "domain-2"
✓ Showing versions
COMPONENT REPOSITORY CONFIGURED RUNNING ARCH
submariner-gateway quay.io/submariner 0.17.0 release-0.17-72c0e6dd56c8 amd64
submariner-routeagent quay.io/submariner 0.17.0 release-0.17-72c0e6dd56c8 amd64
submariner-globalnet quay.io/submariner 0.17.0 release-0.17-72c0e6dd56c8 amd64
submariner-metrics-proxy quay.io/submariner 0.17.0 release-0.17-81b7e55f5306 amd64
submariner-operator quay.io/submariner 0.17.0 release-0.17-d750fbdcb610 amd64
submariner-lighthouse-agent quay.io/submariner 0.17.0 release-0.17-7ad4dd387b0b amd64
submariner-lighthouse-coredns quay.io/submariner 0.17.0 release-0.17-7ad4dd387b0b amd64
Cluster "domain-3"
✓ Showing versions
COMPONENT REPOSITORY CONFIGURED RUNNING ARCH
submariner-gateway quay.io/submariner 0.17.0 release-0.17-72c0e6dd56c8 amd64
submariner-routeagent quay.io/submariner 0.17.0 release-0.17-72c0e6dd56c8 amd64
submariner-globalnet quay.io/submariner 0.17.0 release-0.17-72c0e6dd56c8 amd64
submariner-metrics-proxy quay.io/submariner 0.17.0 release-0.17-81b7e55f5306 amd64
submariner-operator quay.io/submariner 0.17.0 release-0.17-d750fbdcb610 amd64
submariner-lighthouse-agent quay.io/submariner 0.17.0 release-0.17-7ad4dd387b0b amd64
submariner-lighthouse-coredns quay.io/submariner 0.17.0 release-0.17-7ad4dd387b0b amd64
Cluster "e2e-mgmt"
✓ Showing versions
COMPONENT REPOSITORY CONFIGURED RUNNING ARCH
submariner-operator quay.io/submariner 0.17.0 release-0.17-d750fbdcb610 amd64
Also, can you run subctl verify --context
--tocontext --only connectivity --verbose ?
It seems that I have failing tests on domain-2 and domain-3 because the pods that must run the tests cannot be deployed.
The events mention :
"message": "0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.."
What affinity/selector should have the nodes to be able to deploy the pods for tests ?
See attachment for the entire file.
Please attach the tar of subctl gather from domain-2 and domain-3 clusters.
See attachment (sorry github only supports zip files, not tar).
Also, did you check if your use-case (i.e., curl to remote cluster service) is working fine when both control-plane and data-plane were using the same interface?
To verify this, I installed HTTP servers on every domain listening on :
enp1s0
on port 8080
enp2s0
on port 8081
Test for control-plane (enp1s0) :
# from domain-3 to domain-2
# control-plane
curl 172.16.100.81:8080 -sSLI
HTTP/1.0 200 OK
# data plane
curl 172.16.110.81:8081 -sSLI
HTTP/1.0 200 OK
# from domain-2 to domain-3
# control plane
curl 172.16.100.84:8080 -sSLI
HTTP/1.0 200 OK
# data plane
curl 172.16.110.84:8081 -sSLI
HTTP/1.0 200 OK
Everything works fine for simple curl
commands.
What affinity/selector should have the nodes to be able to deploy the pods for tests ?
In subctl verify we deploy connectivity test pods on GW node for some tests and on non-GW nodes for other tests, we use kubernetes.io/hostname NodeSelector for that purpose.
Could you label(submariner.io/gateway=true) only a single node as GW and rerun subctl verify ?
What affinity/selector should have the nodes to be able to deploy the pods for tests ?
In subctl verify we deploy connectivity test pods on GW node for some tests and on non-GW nodes for other tests, we use kubernetes.io/hostname NodeSelector for that purpose.
Could you label(submariner.io/gateway=true) only a single node as GW and rerun subctl verify ?
Hello @yboaron,
Do you mean that if more than one node is labelled with submariner.io/gateway=true
, the tests cannot run ?
For know, indeed I have all of the nodes of my clusters labelled with it :
$ kubectl label --list nodes --all --context domain-2 | grep submariner
submariner.io/gateway=true
submariner.io/gateway=true
submariner.io/gateway=true
$ kubectl label --list nodes --all --context domain-3 | grep submariner
submariner.io/gateway=true
submariner.io/gateway=true
submariner.io/gateway=true
Since you labelled all 3 nodes as GWs on both clusters , tests that need to run one of the test pods on non-GW node will fail [1] while tests that need to run client pod and listener pod on GW will succeed [2] .
Can you label(submariner.io/gateway=true) only a single node as GW on each cluster and rerun subctl verify ?
[1] `[0mBasic TCP connectivity tests across overlapping clusters without discovery [38;5;243mwhen a pod connects via TCP to the globalIP of a remote service [0mwhen the pod is on a gateway and the remote service is not on a gateway [38;5;9m[1m[It] should have sent the expected data from the pod to the other pod[0m [38;5;204m[dataplane, globalnet][0m [38;5;243mgithub.com/submariner-io/submariner@v0.17.0/test/e2e/dataplane/tcp_gn_pod_connectivity.go:53[0m
[38;5;9m[FAILED] Failed to await pod ready. Pod "tcp-check-listenermxz66" is still pending: status: { "phase": "Pending", "conditions": [ { "type": "PodScheduled", "status": "False", "lastProbeTime": null, "lastTransitionTime": "2024-03-11T09:05:50Z", "reason": "Unschedulable", "message": "0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.." } ], "qosClass": "BestEffort" } `
[2]
`[0mBasic TCP connectivity tests across overlapping clusters without discovery [38;5;243mwhen a pod connects via TCP to the globalIP of a remote service [0mwhen the pod is on a gateway and the remote service is on a gateway [0m[1mshould have sent the expected data from the pod to the other pod[0m [38;5;204m[dataplane, globalnet, basic][0m
[38;5;243mgithub.com/submariner-io/submariner@v0.17.0/test/e2e/dataplane/tcp_gn_pod_connectivity.go:53[0m
[1mSTEP:[0m Creating namespace objects with basename "dataplane-gn-conn-nd" [38;5;243m@ 03/11/24 09:09:51.484[0m
[1mSTEP:[0m Generated namespace "e2e-tests-dataplane-gn-conn-nd-zz54k" in cluster "domain-2" to execute the tests in [38;5;243m@ 03/11/24 09:09:51.737[0m
[1mSTEP:[0m Creating namespace "e2e-tests-dataplane-gn-conn-nd-zz54k" in cluster "domain-3" [38;5;243m@ 03/11/24 09:09:51.737[0m
[1mSTEP:[0m Creating a listener pod in cluster "domain-3", which will wait for a handshake over TCP [38;5;243m@ 03/11/24 09:09:52.507[0m
[1mSTEP:[0m Pointing a ClusterIP service to the listener pod in cluster "domain-3" [38;5;243m@ 03/11/24 09:10:04.21[0m
[1mSTEP:[0m Creating a connector pod in cluster "domain-2", which will attempt the specific UUID handshake over TCP [38;5;243m@ 03/11/24 09:10:07.866[0m
Mar 11 09:10:21.482: INFO: ExecWithOptions &{Command:[sh -c for j in $(seq 1 50); do echo [dataplane] connector says 67dbefdb-e262-4c91-a4bd-ac2c20eb151a; done | for i in $(seq 2); do if nc -v 242.1.255.253 1234 -w 60; then break; else sleep 30; fi; done] Namespace:e2e-tests-dataplane-gn-conn-nd-zz54k PodName:customl55cl ContainerName:connector-pod Stdin:
[1mSTEP:[0m Verifying that the listener got the connector's data and the connector got the listener's data [38;5;243m@ 03/11/24 09:10:31.115[0m [1mSTEP:[0m Verifying the output of the listener pod contains a cluster-scoped global IP [242.0.0.1 242.0.0.2 242.0.0.3 242.0.0.4 242.0.0.5 242.0.0.6 242.0.0.7 242.0.0.8] of the connector Pod [38;5;243m@ 03/11/24 09:10:31.115[0m [1mSTEP:[0m Deleting service "test-svc-tcp-check-listener" on "domain-3" [38;5;243m@ 03/11/24 09:10:31.873[0m [1mSTEP:[0m Deleting namespace "e2e-tests-dataplane-gn-conn-nd-zz54k" on cluster "domain-2" [38;5;243m@ 03/11/24 09:10:32.83[0m [1mSTEP:[0m Deleting namespace "e2e-tests-dataplane-gn-conn-nd-zz54k" on cluster "domain-3" [38;5;243m@ 03/11/24 09:10:33.62[0m [38;5;10m• [43.343 seconds][0m `
Ok so I applied the label "submariner.io/gateway=true" on only one node for domain-2 and domain-3 clusters.
But know I am confused because the domain-3 connection is down :
subctl show connections
Cluster "e2e-mgmt"
⚠ Submariner connectivity feature is not installed
Cluster "domain-2"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
harry domain-3 172.16.100.84 no libreswan 242.1.0.0/16 connecting 0s
Cluster "domain-3"
✗ Showing Connections
✗ No connections found
So I reinstalled Submariner on domain-3 but I got the same problem.
subctl uninstall --context domain-3
subctl join broker-info.subm --clustercidr "10.42.0.0/16" --globalnet --clusterid domain-3 --context domain-3
Every pod seems fine :
kubectl get pods -n submariner-operator --context domain-3
NAME READY STATUS RESTARTS AGE
submariner-gateway-qhq6s 1/1 Running 0 25m
submariner-globalnet-b46h6 1/1 Running 0 25m
submariner-lighthouse-agent-749f576cd9-t87fw 1/1 Running 0 25m
submariner-lighthouse-coredns-86b594f7cd-fptx7 1/1 Running 0 25m
submariner-lighthouse-coredns-86b594f7cd-qgz7m 1/1 Running 0 25m
submariner-metrics-proxy-qfjlv 2/2 Running 0 25m
submariner-operator-7994fc86c5-w95w8 1/1 Running 0 26m
submariner-routeagent-6jv4v 1/1 Running 0 25m
submariner-routeagent-bkbsk 1/1 Running 0 25m
submariner-routeagent-dj7c5 1/1 Running 0 25m
But the gateway is showing an error in logs (kubectl logs submariner-gateway-qhq6s -n submariner-operator --context domain-3
) :
# ...
2024-03-11T15:22:20.397Z ERR ..gine/cableengine.go:147 CableEngine Error installing cable for &natdiscovery.NATEndpointInfo{Endpoint:v1.Endpoint{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"domain-2-submariner-cable-domain-2-172-16-100-81", GenerateName:"", Namespace:"submariner-operator", SelfLink:"", UID:"57c8b6c2-6bdb-411b-8eb9-dcf6282515a1", ResourceVersion:"3416393", Generation:1, CreationTimestamp:time.Date(2024, time.March, 11, 14, 55, 19, 0, time.Local), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"submariner-io/clusterID":"domain-2"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"submariner-gateway", Operation:"Update", APIVersion:"submariner.io/v1", Time:time.Date(2024, time.March, 11, 14, 55, 19, 0, time.Local), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc000204558), Subresource:""}}}, Spec:v1.EndpointSpec{ClusterID:"domain-2", CableName:"submariner-cable-domain-2-172-16-100-81", HealthCheckIP:"242.0.255.254", Hostname:"porthos", Subnets:[]string{"242.0.0.0/16"}, PrivateIP:"172.16.100.81", PublicIP:"172.16.110.81", NATEnabled:true, Backend:"libreswan", BackendConfig:map[string]string{"natt-discovery-port":"4490", "preferred-server":"false", "public-ip":"ipv4:172.16.110.81", "udp-port":"4500"}}}, UseNAT:false, UseIP:"172.16.100.81"} error="error installing Endpoint cable \"submariner-cable-domain-2-172-16-100-81\": error whacking with args [--psk --encrypt --name submariner-cable-domain-2-172-16-100-81-0-0 --id 172.16.100.84 --host 172.16.100.84 --client 242.1.0.0/16 --ikeport 4500 --to --id 172.16.100.81 --host 172.16.100.81 --client 242.0.0.0/16 --ikeport 4500 --dpdaction=hold --dpddelay 30]: exit status 20"
Could you please reinstall submariner on both clusters and in case you still hit the connection issue please share subctl gather from both clusters.
BTW, I can see that Submariner detects the CNI as generic and not flannel , do you have daemonset named flannel in kube-system NS ? do you have volume named flannel in this daemonset ?
Hello @yboaron
Sorry for the late answer ! I tested multiple things :
domain-3
cluster (K3S) and Submariner on the former -> not successfulsubctl join broker-info.subm --clustercidr "10.42.0.0/16" --globalnet --cable-driver wireguard
-> connections are successful but not the tests :$ subctl show connections
Cluster "domain-3"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
porthos domain-2 172.16.100.81 no wireguard 242.0.0.0/16 connected 1.430896ms
Cluster "e2e-mgmt"
⚠ Submariner connectivity feature is not installed
Cluster "domain-2"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
harry domain-3 172.16.100.84 no wireguard 242.1.0.0/16 connected 1.006687ms
But in the end, the tests were not successful :
Summarizing 10 Failures:
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod connects via TCP to the globalIP of a remote
service when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent the expected data from the pod to
the other pod [dataplane, globalnet, basic]
github.com/submariner-io/shipyard@v0.17.0/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod connects via TCP to the globalIP of a remote
service when the pod is on a gateway and the remote service is not on a gateway [It] should have sent the expected data from the pod to the
other pod [dataplane, globalnet]
github.com/submariner-io/shipyard@v0.17.0/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod matching an egress IP namespace selector conn
ects via TCP to the globalIP of a remote service when the pod is not on a gateway and the remote service is not on a gateway [It] should hav
e sent the expected data from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/shipyard@v0.17.0/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod matching an egress IP namespace selector conn
ects via TCP to the globalIP of a remote service when the pod is on a gateway and the remote service is on a gateway [It] should have sent t
he expected data from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/submariner@v0.17.0/test/e2e/framework/dataplane.go:200
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod matching an egress IP pod selector connects v
ia TCP to the globalIP of a remote service when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent
the expected data from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/shipyard@v0.17.0/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod matching an egress IP pod selector connects v
ia TCP to the globalIP of a remote service when the pod is on a gateway and the remote service is on a gateway [It] should have sent the exp
ected data from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/submariner@v0.17.0/test/e2e/framework/dataplane.go:200
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod with HostNetworking connects via TCP to the g
lobalIP of a remote service when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent the expected d
ata from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/shipyard@v0.17.0/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod with HostNetworking connects via TCP to the g
lobalIP of a remote service when the pod is on a gateway and the remote service is not on a gateway [It] should have sent the expected data
from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/shipyard@v0.17.0/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod with HostNetworking connects via TCP to the g
lobalIP of a remote service when the pod is on a gateway and the remote service is not on a gateway [It] should have sent the expected data
from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/shipyard@v0.17.0/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod connects via TCP to the globalIP of a remote
headless service when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent the expected data from th
e pod to the other pod [dataplane, globalnet]
github.com/submariner-io/shipyard@v0.17.0/test/e2e/framework/network_pods.go:196
[FAIL] Basic TCP connectivity tests across overlapping clusters without discovery when a pod connects via TCP to the globalIP of a remote
service in reverse direction when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent the expected
data from the pod to the other pod [dataplane, globalnet]
github.com/submariner-io/shipyard@v0.17.0/test/e2e/framework/network_pods.go:196
Ran 13 of 47 Specs in 1888.586 seconds
FAIL! -- 3 Passed | 10 Failed | 0 Pending | 34 Skipped
Here is the complete file : verify-202403121045.log
I think there is something important to note here : K3S supports different backends for Flannel, so I configure it with Wireguard to encrypt the inter-node communications.
I suppose that Submariner must adapt its cable-driver
with wireguard
then.
The main concern I have is that K3S will deprecate the support of IPSec for its flannel CNI :
K3s no longer includes strongSwan swanctl and charon binaries starting with the 2022-12 releases (v1.26.0+k3s1, v1.25.5+k3s1, v1.24.9+k3s1, v1.23.15+k3s1). Please install the correct packages on your node before upgrading to or installing these releases if you want to use the ipsec backend.
Q: can you confirm that wireguard as the cable driver is not supposed to change/imply more modifications during the installation/runtime of Submariner than --cable-driver wireguard
?
BTW, I can see that Submariner detects the CNI as generic and not flannel , do you have daemonset named flannel in kube-system NS ? do you have volume named flannel in this daemonset ?
Indeed you are right, the cni is detected as generic
(for both of the clusters).
Example during the installation on domain-2
:
$ subctl join broker-info.subm --clustercidr "10.42.0.0/16" --globalnet --cable-driver wireguard --clusterid domain-2 --context domain-2
✓ broker-info.subm indicates broker is at https://172.16.100.99:6443
✓ Discovering network details
Network plugin: generic # <--- HERE
Service CIDRs: [10.43.0.0/16]
Cluster CIDRs: []
There are 1 node(s) labeled as gateways:
- porthos
But in K3S there is no daemonset created for Flannel. Is Submariner supposed to base this detection on a deamonset ?
Q: can you confirm that wireguard as the cable driver is not supposed to change/imply more modifications during the installation/runtime of Submariner than --cable-driver wireguard ?
To enable WireGuard cable driver, you need to install it on GW node (search WireGuard here )
But in K3S there is no daemonset created for Flannel. Is Submariner supposed to base this detection on a deamonset ?
Yep, but we can workaround pod CIDR discovery by specifying --clustercidr flag in join command (as you did).
Can you upload subctl gather and subctl diagnose all from both clusters ?
BTW, did you install clusters with overlapping CIDRs on purpose? if it isn't mandatory maybe we can start with non-overlapping CIDRs and eliminate GlobalNet complexity
Ok so I made some tests following your recommendations :
10.44.0.0/16
and services in 10.45.0.0/16
for cluster domain-2
(--cluster-cidr=10.44.0.0/16 --service-cidr=10.45.0.0/16
)10.46.0.0/16
and services in 10.47.0.0/16
for cluster domain-3
(--cluster-cidr=10.46.0.0/16 --service-cidr=10.47.0.0/16
)join
commands and I removed every --globalnet
option from them. Something like :subctl deploy-broker --context e2e-mgmt
subctl join broker-info.subm --clustercidr "10.44.0.0/16" --servicecidr "10.45.0.0/16" --cable-driver wireguard --clusterid domain-2 --context domain-2
subctl join broker-info.subm --clustercidr "10.46.0.0/16" --servicecidr "10.47.0.0/16" --cable-driver wireguard --clusterid domain-3 --context domain-3
But again, the tests fail.
Here are the results of subctl gather
and subctl diagnose all
:
diagnose-202403121620.log submariner-gather-202403121620.zip
Sorry for the late answer,
I checked the logs, looks fine, Wireguard connection is UP between clusters, I assume its some datapath issue for this combination of Flannel CNI, WireGuard and K3S, which needs further investigation.
Since the WireGuard connection is up, I would expect at least the tests between pod@gw_node in domain2 and pod@gw_node in domain3 to pass.
What is the latest output of subctl verify --context
Can you also check subctl diagnose firewall intra-cluster --kubeconfig < cluster_kubeconfig> ?
Hello @yboaron, yes I agree that the combination of the three is suspicious. I will execute the tests you ask and I was thinking about making the same tests with Cilium instead of Flannel, just in case.
Unfortunately, due to the Kubecon EU, I will not be able to test these, this week. I will keep you in touch as soon as I can.
Thanks again !
Hello @yboaron ! Sorry for the veeery long delay, I was busy to try new things after the Kubecon !
Regarding this issue, I had to deploy the clusters from scratch using Cilium, but it is still failing. Sorry, I changed one or two thins but nothing big :
e2e-mgmt
is the broker, domain-1
and domain-2
are the participating clusters--cluster-cidr=10.10.0.0/16 --service-cidr=10.11.0.0/16
for domain-1
--cluster-cidr=10.12.0.0/16 --service-cidr=10.13.0.0/16
for domain-2
domain-1
and domain-2
I deployed the submariner broker as usual. Then for the participating clusters :
# labels and annotations
kubectl label node porthos "submariner.io/gateway=true" --context domain-1
kubectl label node harry "submariner.io/gateway=true" --context domain-2
Joining domain-1
and domain-2
:
subctl join broker-info.subm --clustercidr "10.10.0.0/16" --servicecidr "10.11.0.0/16" --cable-driver wireguard --clusterid domain-1 --context domain-1
subctl join broker-info.subm --clustercidr "10.12.0.0/16" --servicecidr "10.13.0.0/16" --cable-driver wireguard --clusterid domain-2 --context domain-2
I deployed this in domain-2
in the namespace federation-1
:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: rebel-base
spec:
selector:
matchLabels:
name: rebel-base
replicas: 1
template:
metadata:
labels:
name: rebel-base
spec:
containers:
- name: rebel-base
image: docker.io/nginx:1.15.8
ports:
- containerPort: 80
name: http
volumeMounts:
- name: html
mountPath: /usr/share/nginx/html/
volumes:
- name: html
configMap:
name: rebel-base-response
items:
- key: message
path: index.html
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rebel-base-response
data:
message: "hello federation-1 from domain-2\n"
---
apiVersion: v1
kind: Service
metadata:
name: rebel-base-svc
spec:
type: ClusterIP
ports:
- name: http
protocol: TCP
port: 80
targetPort: http
selector:
name: rebel-base
Now I export the service :
subctl export service rebel-base-svc -n federation-1 --context domain-2
kubectl get serviceimport -n federation-1 --context domain-1
NAME TYPE IP AGE
rebel-base-svc ClusterSetIP 96s
Local requests work (domain-2
to domain-2
) :
$ kubectl run x-wing --rm -it --image nicolaka/netshoot --context domain-1 -- \
curl rebel-base-svc.federation-1.svc.clusterset.local.
hello federation-1 from domain-2
But requests from domain-1
continue to fail :
$ kubectl run x-wing --rm -it --image nicolaka/netshoot --context domain-1 -- \
curl rebel-base-svc.federation-1.svc.clusterset.local. --connect-timeout 3
curl: (28) Failed to connect to rebel-base-svc.federation-1.svc.clusterset.local. port 80 after 3002 ms: Timeout was reached
This time, I used Cilium Hubble to monitor the network.
I found something strange :
domain-1
reaches the service in domain-2
domain-2
while it is intended to domain-1
(federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
)Observing the pod x-wing
in domain-1
during the curl request :
Apr 9 13:35:59.989: default/x-wing:56847 (ID:84471) -> kube-system/coredns-6799fbcd5-vgsnf:53 (ID:110854) to-endpoint FORWARDED (UDP)
Apr 9 13:35:59.991: default/x-wing:56847 (ID:84471) <- kube-system/coredns-6799fbcd5-vgsnf:53 (ID:110854) to-endpoint FORWARDED (UDP)
Apr 9 13:35:59.992: default/x-wing:48924 (ID:84471) -> 10.13.178.142:80 (world) to-stack FORWARDED (TCP Flags: SYN)
# ends here
10.13.178.142:80
is indeed the rebel-base-svc
in domain-2
:
kubectl get svc -n federation-1 --context domain-2 -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
rebel-base-svc ClusterIP 10.13.178.142 <none> 80/TCP 44m name=rebel-base
Observing the pod rebel-base
in domain-2
during the curl request :
Apr 9 13:36:28.812: 10.10.2.95:40152 (world) -> federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-endpoint FORWARDED (TCP Flags: SYN)
Apr 9 13:36:28.812: 10.10.2.95:40152 (world) <- federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-stack FORWARDED (TCP Flags: SYN, ACK)
Apr 9 13:36:28.812: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 9 13:36:29.041: 10.10.2.95:40152 (world) <> federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-overlay FORWARDED (TCP Flags: SYN)
Apr 9 13:36:29.826: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 9 13:36:29.842: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 9 13:36:30.071: 10.10.2.95:40152 (world) <> federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-overlay FORWARDED (TCP Flags: SYN)
Apr 9 13:36:31.842: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 9 13:36:36.066: 10.10.2.95:40152 (world) <- federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-stack FORWARDED (TCP Flags: SYN, ACK)
Apr 9 13:36:36.067: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 9 13:36:44.259: 10.10.2.95:40152 (world) <- federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-stack FORWARDED (TCP Flags: SYN, ACK)
Apr 9 13:36:44.259: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
Apr 9 13:37:00.386: 10.10.2.95:40152 (world) <- federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) to-stack FORWARDED (TCP Flags: SYN, ACK)
Apr 9 13:37:00.386: federation-1/rebel-base-596bc7d8ff-45tpg:80 (ID:142026) <> 10.10.2.95:40152 (world) Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
You can notice the errors messages Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
.
10.10.2.95:40152
is indeed the x-wing
pod in domain-1
:
kubectl get pod -n default --context domain-1 -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
x-wing 1/1 Running 0 46m 10.10.2.95 athos <none> <none>
I have no idea why the packets are not correctly routed in both sides.
Hi @IceManGreen ,
A. We have the following configuration K3S, Cilium as the CNI and cable driver is wireguard.
B. You can resolve domain2/rebel-base-svc IP from domain1, which means that submariner inter-cluster wireguard tunnels are up and multi-cluster service discovery looks fine.
C. So, we need to understand why SYN/ACK packet is being dropped, Submariner implements the egress datapath between clusters lets the CNI handle ingress, and sets rp_filter to loose mode to allow asymmetric traffic in the CNI network interfaces. check this for more details.
It could be that Submariner failed to detect the CNI network interface and updates rp_filter(Submariner looks for a network interface with an IP address from the clustercidr range), and packets are dropped by the kernel or some firewall/infra SG that blocks inter-cluster traffic.
D.
Hi @yboaron
Thanks for the summary !
Here is the result of subctl verify
and subctl gather
for (and between) domain-1 and domain-2 :
According to your words about the CNI detection by Submariner, I ran the following test :
$ subctl diagnose cni --context domain-1
⚠ Checking Submariner support for the CNI network plugin
⚠ Submariner could not detect the CNI network plugin and is using ("generic") plugin. It may or may not work.
Indeed, Submariner does not detect Cilium as the cluster CNI. Same thing in domain-2.
Do I have a way to force Submariner to consider the cluster's CNI as Cilium ? I do not know if such a question makes sense.
Hi @IceManGreen , Sorry for the late response,
In your case Submariner failed to detect Cilium and used "generic" CNI, Submariner refers to generic CNI as a kube-proxy/iptables based CNI.
I checked Submariner logs from both clusters, found no errors. Submariner route-agent detects 'cilium_host' interface as the CNI interface (interface with IP address from clusterCIDR) and updates its rp_filter to '2'.
From subctl verify output, it appears that all the connectivity tests in which one of client or server pods is running on non GW nodes are failing, further datapath investigation is needed here.
I think I understand why the TCP SYN/ACK packet is wrongly handled by cilium (it should be handled by Submariner), x-wing pod IP 10.10.2.95, and checking domain-2/hermione node ip-routing table I can see:
10.10.0.0/16 via 240.19.112.84 dev vx-submariner proto static 10.10.1.0/24 via 10.12.1.86 dev cilium_host proto kernel src 10.12.1.86 mtu 1370 10.10.2.0/24 via 10.12.1.86 dev cilium_host proto kernel src 10.12.1.86 mtu 1370
So, packet should be routed by Submariner according to
10.10.0.0/16 via 240.19.112.84 dev vx-submariner proto static
but some component added some routes that includes remote cluster CIDR , like:
10.10.2.0/24 via 10.12.1.86 dev cilium_host proto kernel src 10.12.1.86 mtu 1370
in this case kernel (Longest prefix match) will choose to route the packet wrongly using 10.10.2.0/24 via 10.12.1.86 dev cilium_host proto kernel src 10.12.1.86 mtu 1370 and not via vx-submariner.
I can see similar routes also on domain1 cluster, for example : domain-1/aramis node:
ip route show default via 172.16.110.1 dev enp2s0 10.10.0.0/24 via 10.10.0.46 dev cilium_host proto kernel src 10.10.0.46 10.10.0.46 dev cilium_host proto kernel scope link 10.10.1.0/24 via 10.10.0.46 dev cilium_host proto kernel src 10.10.0.46 mtu 1450 10.10.2.0/24 via 10.10.0.46 dev cilium_host proto kernel src 10.10.0.46 mtu 1450 10.12.0.0/24 via 10.10.0.46 dev cilium_host proto kernel src 10.10.0.46 mtu 1370 10.12.0.0/16 via 240.19.112.81 dev vx-submariner proto static 10.12.1.0/24 via 10.10.0.46 dev cilium_host proto kernel src 10.10.0.46 mtu 1370 10.12.2.0/24 via 10.10.0.46 dev cilium_host proto kernel src 10.10.0.46 mtu 1370 10.13.0.0/16 via 240.19.112.81 dev vx-submariner proto static 10.14.0.0/24 via 10.10.0.46 dev cilium_host proto kernel src 10.10.0.46 mtu 1370 10.14.1.0/24 via 10.10.0.46 dev cilium_host proto kernel src 10.10.0.46 mtu 1370 10.14.2.0/24 via 10.10.0.46 dev cilium_host proto kernel src 10.10.0.46 mtu 1370 172.16.100.0/22 dev enp1s0 proto kernel scope link src 172.16.100.82 172.16.110.0/22 dev enp2s0 proto kernel scope link src 172.16.110.82 240.0.0.0/8 dev vx-submariner proto kernel scope link src 240.19.116.82
You should first address these routes issue
Hi @IceManGreen , any update? can we close this issue?
Hello @yboaron , sorry for the very late answer too !
Are "10.10.0.0/16" and "10.11.0.0/16" for domain1 and "10.12.0.0/16" and "10.13.0.0/16" for domain2 the clusters CIDRS?
Yes it is !
Regarding the rest of your answer, thank you so much for guiding me through so much details, it helps a lot. Unfortunately, I did not succeed in managing the routes and make things work in the end. Since CIlium is adding its own routing logic that comes in conflicts with Submariner, it makes the solution very hard to manage because I was planning to scale things up in terms machines and clusters ... So I will probably look for another solution, like Calico + Submariner if they fit together. This is what I recommend for anyone who faces the same kind of trouble with Cilium and Submariner.
Thanks again for your help ! I think we are good with this issue !
@IceManGreen , I'm going to close this topic, feel free to reopen it if it's still relevant
@yboaron @IceManGreen I have encountered the same problem in the flannel environment. The root cause of the problem is that the MAC address of the vx-submariner network card created by agent-route on each node is the same, resulting in the fact that the service of another cluster can only be accessed from the gateway node, because the network from the non-gateway node to the gateway node is not accessible (through the vx-submariner network card). so l will read the agent-route code of create interface named vx-submariner.
@yboaron l think need to reopen this issue.
@yboaron l think need to reopen this issue.
@huangjiasingle OK, let's reopen it.
could you please specify what issue you encountered? also elaborate on your environment (CNI, platform, Submariner version, cable driver, etc)
@yboaron my env:
btw, l read the route agent code of create vx-submariner interface. it's doesn't define the mac when create the vx-submariner interface. so l want kown who set the mac addr of vx-submariner.
Hi @huangjiasingle , thanks for reaching out.
This looks like a data path issue that needs further investigation, please attach
subctl diagnose all --kubeconfig
Hello everyone,
I have an issue with a Submariner usecase where I want to separate the control-plane network (used by the broker to communicate with the participating clusters) and the data plane network (used by participating clusters to connect my applications through the gateways).
I have deployed 3 clusters :
e2e-mgmt
domain-2
anddomain-3
Note that each deployment clusters have 3 nodes. Each node has 2 interfaces :
The nodes are actually virtual machines. Some tests showed that each VM can reach the other communicating through the control-plane or the data-plane (ex: from
172.16.100.10
to172.16.100.11
but not from172.16.100.10
to172.16.110.11
).I labeled and annotated all my nodes in
domain-2
anddomain-3
like :Because I want the VPN tunnels to communicate through
172.16.110.0/24
(data plane) even if the Kubernetes APIs are listening to172.16.100.0/24
(control plane).I created the broker and joined the clusters using :
Indeed, the clusters joined the broker :
The connections seem good :
Or maybe the remote IPs should be on
172.16.110.0/24
?The gateways seem good :
I created a Nginx service in
domain-2
and namespacehello-domain-2
calledhello-world-svc
.Using netshoot, I can tell that the requests are working locally from
domain-2
I exported the service and
domain-3
consumed properly theServiceImport
from the Broker :However, the same test with netshoot from
domain-3
does not work :Even if
dig
resolves it properly :What did I do wrong ?