Closed ccwalterhk closed 2 years ago
Additional info:
walter@uat6-server:~ walter@uat6-server:~ export KUBECONFIG=kubeconfig.cluster-a walter@uat6-server:~ kubectl config use-context cluster-a Switched to context "cluster-a".
walter@uat6-server:~ kubectl get namespace submariner-k8s-broker NAME STATUS AGE submariner-k8s-broker Active 15h walter@uat6-server:~ kubectl get crds | grep -iE 'submariner|multicluster.x-k8s.io' submariners.submariner.io 2021-10-30T12:43:16Z servicediscoveries.submariner.io 2021-10-30T12:43:16Z brokers.submariner.io 2021-10-30T12:43:16Z serviceimports.multicluster.x-k8s.io 2021-10-30T12:43:31Z serviceexports.multicluster.x-k8s.io 2021-10-30T12:43:31Z clusters.submariner.io 2021-10-30T12:43:33Z endpoints.submariner.io 2021-10-30T12:43:33Z gateways.submariner.io 2021-10-30T12:43:33Z clusterglobalegressips.submariner.io 2021-10-30T12:43:33Z globalegressips.submariner.io 2021-10-30T12:43:33Z globalingressips.submariner.io 2021-10-30T12:43:33Z
walter@uat6-server:~ kubectl -n submariner-k8s-broker get clusters.submariner.io No resources found in submariner-k8s-broker namespace.
walter@uat6-server:~ kubectl get pod -n submariner-operator NAME READY STATUS RESTARTS AGE submariner-operator-745d8c89d8-lh9v8 1/1 Running 2 15h submariner-routeagent-5vnnb 1/1 Running 0 52m submariner-routeagent-4fpfk 1/1 Running 0 52m submariner-lighthouse-agent-78cb477567-9g4p7 1/1 Running 0 52m submariner-lighthouse-coredns-7744cbd5b7-lfw2w 1/1 Running 0 52m submariner-lighthouse-coredns-7744cbd5b7-s6f6w 1/1 Running 0 52m submariner-gateway-94wms 0/1 CrashLoopBackOff 15 52m walter@uat6-server:~
walter@uat6-server:~
walter@uat6-server:~
walter@uat6-server:~ kubectl describe pod submariner-gateway-94wms -n submariner-operator
Name: submariner-gateway-94wms
Namespace: submariner-operator
Priority: 0
Node: uat9-server/192.168.1.72
Start Time: Sun, 31 Oct 2021 03:48:13 +0000
Labels: app=submariner-gateway
controller-revision-hash=86c987c55f
pod-template-generation=1
Annotations:
SUBMARINER_CLUSTERID: cluster-a
SUBMARINER_COLORCODES: blue
SUBMARINER_DEBUG: false
SUBMARINER_NATENABLED: false
SUBMARINER_BROKER: k8s
SUBMARINER_CABLEDRIVER:
BROKER_K8S_APISERVER: 192.168.1.38:6443
BROKER_K8S_APISERVERTOKEN: eyJhbGciOiJSUzI1NiIsImtpZCI6InZnN1pkcWxIVzdoejBBS1VKcFBGSWVySlppNHJ5RFcwTmFQcXpOZng1LVEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJzdWJtYXJpbmVyLWs4cy1icm9rZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoiY2x1c3Rlci1jbHVzdGVyLWEtdG9rZW4tdzRnNWciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiY2x1c3Rlci1jbHVzdGVyLWEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI3YWQxZTJlMS1iNDBkLTQ0N2EtYjhlYy0wMDhiZTVjMmFiZGYiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6c3VibWFyaW5lci1rOHMtYnJva2VyOmNsdXN0ZXItY2x1c3Rlci1hIn0.LMZ1Ju1MyQBXE_BhWRqFMKHrWjXGr0V6nFOslh_eJJSSmpfMJGiCQUe8RTQBnQ-9uqqXbd8PWnmgJuYYMJwH7wE50aW1dcdIt4cZ1eBeCOgC43JqmoMv88AUP4AezcIDxRDHzX4wTVUgZYKN97NBL5hUcB7XYCnqQOZ25D0A_xHG-8dGG9omeI3C9s_78VQl6-wPycttsZRbqnEfketgWyxvWJ0tAhW-hgS7_aNZcII3kp5Wm4HVHwVsFqNcp4T8NJZTdXkFOBmm_aZFCBcQULxtp8KS4Tlexj9zuSToSt5hdO7YeDqBkcdZW1LpiJoQzJT9hJGkqd-Xo8QA788fTA
BROKER_K8S_REMOTENAMESPACE: submariner-k8s-broker
BROKER_K8S_CA: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJkakNDQVIyZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREFqQWpNU0V3SHdZRFZRUUREQmhyTTNNdGMyVnkKZG1WeUxXTmhRREUyTXpVMU56TTFOakF3SGhjTk1qRXhNRE13TURVMU9USXdXaGNOTXpFeE1ESTRNRFUxT1RJdwpXakFqTVNFd0h3WURWUVFEREJock0zTXRjMlZ5ZG1WeUxXTmhRREUyTXpVMU56TTFOakF3V1RBVEJnY3Foa2pPClBRSUJCZ2dxaGtqT1BRTUJCd05DQUFTUHF4K3F6am42Z3FrN2JLcGFQSlhVY2EycWZMMXgrRTB1OFJ3c1cyUzUKZFhkUVVmNGxBZStwUmcwQzZRell1OUZtaEVKb0prdU5uRTRCTmhsYTNiaXJvMEl3UURBT0JnTlZIUThCQWY4RQpCQU1DQXFRd0R3WURWUjBUQVFIL0JBVXdBd0VCL3pBZEJnTlZIUTRFRmdRVXhlQXhXVTVIR3E4ekNYdDBndVVjCjcvS2p3UVF3Q2dZSUtvWkl6ajBFQXdJRFJ3QXdSQUlnSnlDOXVSRFdwNzBUM1J3WWFDT3BpWHBFWndpMFBIWVEKUkQwdDBrS293RTBDSUM2RlY0UTRWM0hxczBKMTFHcGcyOXRvL2JGUEwzaEVYUm5Qb0hZeUM1bEUKLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
CE_IPSEC_PSK: GrDUpPL9nHLVevwxmHtkPvYCXdVz19E7cX728FjXRht1FD940VbgoTf11/6/+4qZ
CE_IPSEC_DEBUG: false
SUBMARINER_HEALTHCHECKENABLED: true
SUBMARINER_HEALTHCHECKINTERVAL: 1
SUBMARINER_HEALTHCHECKMAXPACKETLOSSCOUNT: 5
NODE_NAME: (v1:spec.nodeName)
POD_NAME: submariner-gateway-94wms (v1:metadata.name)
CE_IPSEC_IKEPORT: 500
CE_IPSEC_NATTPORT: 4500
CE_IPSEC_PREFERREDSERVER: false
CE_IPSEC_FORCEENCAPS: false
Mounts:
/etc/ipsec.d from ipsecd (rw)
/lib/modules from libmodules (ro)
/var/lib/ipsec/nss from ipsecnss (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hnwdm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
ipsecd:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
SizeLimit:
kube-api-access-hnwdm:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
Normal Scheduled 54m default-scheduler Successfully assigned submariner-operator/submariner-gateway-94wms to uat9-server Warning FailedMount 54m kubelet MountVolume.SetUp failed for volume "kube-api-access-hnwdm" : failed to sync configmap cache: timed out waiting for the condition Normal Pulled 52m (x5 over 54m) kubelet Container image "quay.io/submariner/submariner-gateway:0.11.0" already present on machine Normal Created 52m (x5 over 54m) kubelet Created container submariner-gateway Normal Started 52m (x5 over 54m) kubelet Started container submariner-gateway Warning BackOff 3m52s (x230 over 53m) kubelet Back-off restarting failed container
walter@uat6-server:~ walter@uat6-server:~ walter@uat6-server:~ sudo kubectl logs submariner-gateway-gkfgw -n submariner-operator --kubeconfig=kubeconfig.cluster-b
Hello, I ran into the same problem, did you solve this problem?
Submariner does periodic health-check between the gateway nodes and it uses the CNI Interface IP on the host for this use-case. It looks like the CNI in K3s environment is not creating an interface on the host which has an IP from the PodCIDR.
You can disable Submariner health-check support in your deployment to avoid this issue.
Please re-run the subctl join ...
commands once again on your clusters and include --health-check=false
as an argument.
Can we close this?
with the --health-check=false, the gateway is running successfully. No more crash.
walter@k3-subctl-m1:~$ subctl show all --kubeconfig kubeconfig.cluster-a
Cluster "default"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
k3-subctl-w2 cluster-b 192.168.1.93 no libreswan 10.145.0.0/16, 10.144.0.0/24 connected
✓ Showing Endpoints
CLUSTER ID ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE
cluster-a 192.168.1.92 <my public IP> libreswan local
cluster-b 192.168.1.93 <my public IP> libreswan remote
✓ Showing Gateways
NODE HA STATUS SUMMARY
k3-subctl-w1 active All connections (1) are established
Discovered network details via Submariner:
Network plugin: generic
Service CIDRs: [10.45.0.0/16]
Cluster CIDRs: [10.44.0.0/24]
✓ Showing Network details
COMPONENT REPOSITORY VERSION
submariner quay.io/submariner 0.11.0
submariner-operator quay.io/submariner 0.11.0
service-discovery quay.io/submariner 0.11.0
✓ Showing versions
walter@k3-subctl-m1:~$
walter@k3-subctl-m1:~$ subctl show all --kubeconfig kubeconfig.cluster-b
Cluster "default"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
k3-subctl-w1 cluster-a 192.168.1.92 no libreswan 10.45.0.0/16, 10.44.0.0/24 connected
✓ Showing Endpoints
CLUSTER ID ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE
cluster-b 192.168.1.93 <my public IP> libreswan local
cluster-a 192.168.1.92 <my public IP> libreswan remote
✓ Showing Gateways
NODE HA STATUS SUMMARY
k3-subctl-w2 active All connections (1) are established
Discovered network details via Submariner:
Network plugin: generic
Service CIDRs: [10.145.0.0/16]
Cluster CIDRs: [10.144.0.0/24]
✓ Showing Network details
COMPONENT REPOSITORY VERSION
submariner quay.io/submariner 0.11.0
submariner-operator quay.io/submariner 0.11.0
service-discovery quay.io/submariner 0.11.0
✓ Showing versions
However, when I try to ping from pod in cluster a to pod in cluster b, I cannot ping thru.
walter@k3-subctl-m1:~$ sudo kubectl --kubeconfig kubeconfig.cluster-b run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
[sudo] password for walter:
If you don't see a command prompt, try pressing enter.
bash-5.1# ifconfig
eth0 Link encap:Ethernet HWaddr BE:46:42:C6:44:09
inet addr:10.144.1.9 Bcast:10.144.1.255 Mask:255.255.255.0
inet6 addr: fe80::bc46:42ff:fec6:4409/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:13 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1074 (1.0 KiB) TX bytes:628 (628.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
bash-5.1#
walter@k3-subctl-m1:~$ sudo kubectl --kubeconfig kubeconfig.cluster-a run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
If you don't see a command prompt, try pressing enter.
bash-5.1# ifconfig
eth0 Link encap:Ethernet HWaddr DA:E6:09:76:D1:8E
inet addr:10.44.1.14 Bcast:10.44.1.255 Mask:255.255.255.0
inet6 addr: fe80::d8e6:9ff:fe76:d18e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:12 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:984 (984.0 B) TX bytes:628 (628.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
bash-5.1# ping 10.144.19
PING 10.144.19 (10.144.0.19) 56(84) bytes of data.
^C
--- 10.144.19 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1029ms
bash-5.1# ping 10.44.1.14
PING 10.44.1.14 (10.44.1.14) 56(84) bytes of data.
64 bytes from 10.44.1.14: icmp_seq=1 ttl=64 time=0.089 ms
64 bytes from 10.44.1.14: icmp_seq=2 ttl=64 time=0.026 ms
^C
--- 10.44.1.14 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1014ms
rtt min/avg/max/mdev = 0.026/0.057/0.089/0.031 ms
bash-5.1#
when I use the below commands to verify the deployment, I got below output.
walter@k3-subctl-m1:~$ KUBECONFIG=kubeconfig.cluster-a:kubeconfig.cluster-b subctl verify --kubecontexts cluster-a,cluster-b --only service-discovery,connectivity --verbose |more
Performing the following verifications: service-discovery, connectivity
Running Suite: Submariner E2E suite
===================================
Random Seed: 1640491770
Will run 37 of 41 specs
STEP: Creating kubernetes clients
STEP: Setting new cluster ID "cluster-b", previous cluster ID was "cluster-a"
STEP: Creating lighthouse clients
STEP: Creating submariner clients
[dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery when a pod connects via TCP to the globalIP of a remote service when the pod is not on a gateway and the remote service is not on a gateway
should have sent the expected data from the pod to the other pod
github.com/submariner-io/submariner@v0.11.0/test/e2e/dataplane/tcp_gn_pod_connectivity.go:35
STEP: Creating namespace objects with basename "dataplane-gn-conn-nd"
STEP: Generated namespace "e2e-tests-dataplane-gn-conn-nd-svx2r" in cluster "cluster-b" to execute the tests in
STEP: Creating namespace "e2e-tests-dataplane-gn-conn-nd-svx2r" in cluster "cluster-b"
STEP: Deleting namespace "e2e-tests-dataplane-gn-conn-nd-svx2r" on cluster "cluster-b"
STEP: Deleting namespace "e2e-tests-dataplane-gn-conn-nd-svx2r" on cluster "cluster-b"
• Failure in Spec Setup (BeforeEach) [0.043 seconds]
[dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery
github.com/submariner-io/submariner@v0.11.0/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28
when a pod connects via TCP to the globalIP of a remote service [BeforeEach]
github.com/submariner-io/submariner@v0.11.0/test/e2e/dataplane/tcp_gn_pod_connectivity.go:53
when the pod is not on a gateway and the remote service is not on a gateway
github.com/submariner-io/submariner@v0.11.0/test/e2e/dataplane/tcp_gn_pod_connectivity.go:60
should have sent the expected data from the pod to the other pod
github.com/submariner-io/submariner@v0.11.0/test/e2e/dataplane/tcp_gn_pod_connectivity.go:35
Error creating namespace &Namespace{ObjectMeta:{e2e-tests-dataplane-gn-conn-nd-svx2r 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[e2e-framework:dataplane-gn-conn-nd] map[] [] [] []},Spec:NamespaceSpec{Finalizers:[],},Status:NamespaceStatus{Phase:,Conditions:[]NamespaceCondition{}
,},}
Unexpected error:
<*errors.StatusError | 0xc0005081e0>: {
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {
SelfLink: "",
ResourceVersion: "",
Continue: "",
RemainingItemCount: nil,
},
Status: "Failure",
Closing old issues. Please re-open if this is still relevant.
Have some improvement after using ----health-check=false. The gateway is up and running now. But, the issue has not been fixed. the pod in cluster A cannot ping pod in cluster B.
@cwalterhk are you following https://submariner.io/getting-started/quickstart/k3s/?
Yes. I follow the exact steps in the URL. I even use the same POD and service CIDR.
BTW, all the worker and master nodes are on same segment, although I don't think this is a concern.
walter@k3-subctl-m1:~ sudo kubectl --kubeconfig kubeconfig.cluster-a get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3-subctl-m1 Ready control-plane,master 46h v1.22.5+k3s1 192.168.1.90 <none> Ubuntu 20.04.3 LTS 5.4.0-91-generic containerd://1.5.8-k3s1
k3-subctl-w1 Ready <none> 44h v1.22.5+k3s1 192.168.1.92 <none> Ubuntu 20.04.3 LTS 5.4.0-91-generic containerd://1.5.8-k3s1
walter@k3-subctl-m1:~ sudo kubectl --kubeconfig kubeconfig.cluster-b get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3-subctl-m2 Ready control-plane,master 46h v1.22.5+k3s1 192.168.1.91 <none> Ubuntu 20.04.3 LTS 5.4.0-91-generic containerd://1.5.8-k3s1
k3-subctl-w2 Ready <none> 44h v1.22.5+k3s1 192.168.1.93 <none> Ubuntu 20.04.3 LTS 5.4.0-91-generic containerd://1.5.8-k3s1
walter@k3-subctl-m1:~$ subctl show all --kubeconfig kubeconfig.cluster-a
Cluster "default"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
k3-subctl-w2 cluster-b 192.168.1.93 no libreswan 10.145.0.0/16, 10.144.0.0/24 connected
✓ Showing Endpoints
CLUSTER ID ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE
cluster-a 192.168.1.92 58.182.182.152 libreswan local
cluster-b 192.168.1.93 58.182.182.152 libreswan remote
✓ Showing Gateways
NODE HA STATUS SUMMARY
k3-subctl-w1 active All connections (1) are established
Discovered network details via Submariner:
Network plugin: generic
Service CIDRs: [10.45.0.0/16]
Cluster CIDRs: [10.44.0.0/24]
✓ Showing Network details
COMPONENT REPOSITORY VERSION
submariner quay.io/submariner 0.11.0
submariner-operator quay.io/submariner 0.11.0
service-discovery quay.io/submariner 0.11.0
✓ Showing versions
walter@k3-subctl-m1:~$ subctl show all --kubeconfig kubeconfig.cluster-b
Cluster "default"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
k3-subctl-w1 cluster-a 192.168.1.92 no libreswan 10.45.0.0/16, 10.44.0.0/24 connected
✓ Showing Endpoints
CLUSTER ID ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE
cluster-b 192.168.1.93 58.182.182.152 libreswan local
cluster-a 192.168.1.92 58.182.182.152 libreswan remote
✓ Showing Gateways
NODE HA STATUS SUMMARY
k3-subctl-w2 active All connections (1) are established
Discovered network details via Submariner:
Network plugin: generic
Service CIDRs: [10.145.0.0/16]
Cluster CIDRs: [10.144.0.0/24]
✓ Showing Network details
COMPONENT REPOSITORY VERSION
submariner quay.io/submariner 0.11.0
submariner-operator quay.io/submariner 0.11.0
service-discovery quay.io/submariner 0.11.0
✓ Showing versions
walter@k3-subctl-m1:~$
Output of subtle diagnose all:
walter@k3-subctl-m1:~$ subctl diagnose all --kubeconfig kubeconfig.cluster-a
Cluster "default"
✓ Checking Submariner support for the Kubernetes version
✓ Kubernetes version "v1.22.5+k3s1" is supported
✓ Checking Submariner support for the CNI network plugin
✓ The detected CNI network plugin ("generic") is supported
✓ Checking gateway connections
✓ All connections are established
✓ Checking Submariner pods
✓ All Submariner pods are up and running
✓ Non-Globalnet deployment detected - checking if cluster CIDRs overlap
✓ Clusters do not have overlapping CIDRs
✓ Checking Submariner support for the kube-proxy mode
✓ The kube-proxy mode is supported
✗ Checking the firewall configuration to determine if the metrics port (8080) is allowed
✗ The tcpdump output from the sniffer pod does not contain the client pod HostIP. Please check that your firewall configuration allows TCP/8080 traffic on the "k3-subctl-w1" node.
✓ Checking the firewall configuration to determine if VXLAN traffic is allowed
✓ The firewall configuration allows VXLAN traffic
Skipping inter-cluster firewall check as it requires two kubeconfigs. Please run "subctl diagnose firewall inter-cluster" command manually.
walter@k3-subctl-m1:~$
Output of subtle diagnose firewall inter-cluster:
walter@k3-subctl-m1:~$ subctl diagnose firewall inter-cluster kubeconfig.cluster-a kubeconfig.cluster-b
✓ Checking if tunnels can be setup on the gateway node of cluster "cluster-a"
✓ Tunnels can be established on the gateway node
walter@k3-subctl-m1:~$ subctl diagnose firewall inter-cluster kubeconfig.cluster-b kubeconfig.cluster-a
✓ Checking if tunnels can be setup on the gateway node of cluster "cluster-b"
✓ Tunnels can be established on the gateway node
Uploaded subctl gather for cluster a and b.
submariner-cluster-B-20211227124346.tar.gz submariner-cluster-A-20211227124333.tar.gz
It looks like the Submariner Operator was not able to detect the CNI in the cluster and uses "generic" route-agent. May i know the CNI that you are using in the cluster?
Also, I see that tunnels/connections are properly established between the gateway nodes. So the connectivity issue you are seeing is only when the source or destination pod is on non-gateway?
I did not specify any CNI during installation of K3s and submariner. I believe it is using default for K3s which is "--flannel-backend=vxlan". Any command to confirm what CNI is being used?
I only have 1 worker node per cluster. So, the pod is also located in the gateway.
Looking at the logs, yes you seem to be using flannel CNI. So, i think there is some issue (non-fatal) in the submariner-operator code which is unable to detect flannel CNI in K3s. Can you please report an issue in the "submariner-operator" namespace?
However, when I try to ping from pod in cluster a to pod in cluster b, I cannot ping thru.
This is the pod in cluster a:
walter@k3-subctl-m1:~$ sudo kubectl --kubeconfig kubeconfig.cluster-a run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash If you don't see a command prompt, try pressing enter. bash-5.1# ifconfig eth0 Link encap:Ethernet HWaddr DA:E6:09:76:D1:8E inet addr:10.44.1.14 Bcast:10.44.1.255 Mask:255.255.255.0 inet6 addr: fe80::d8e6:9ff:fe76:d18e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:12 errors:0 dropped:0 overruns:0 frame:0 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:984 (984.0 B) TX bytes:628 (628.0 B) bash-5.1# ping 10.144.19 PING 10.144.19 (10.144.0.19) 56(84) bytes of data. ^C --- 10.144.19 ping statistics ---
The ip-address of pod in cluster-b is 10.144.1.9
. Looks like you are trying to ping 10.144.0.19
Are you sure that the ip-address to which you tried to ping belongs to a running pod in cluster-b?
Error creating namespace &Namespace{ObjectMeta:{e2e-tests-dataplane-gn-conn-nd-svx2r 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[e2e-framework:dataplane-gn-conn-nd] map[] [] [] []},Spec:NamespaceSpec{Finalizers:[],},Status:NamespaceStatus{Phase:,Conditions:[]NamespaceCondition{}
,},}
The error does not seem to be related to Submariner and the e2e test code is getting failures when its trying to create a namespace. Please check if you are able to create namespaces in your cluster with the user-account with which you are trying to run the e2e tests.
However, when I try to ping from pod in cluster a to pod in cluster b, I cannot ping thru.
This is the pod in cluster a:
walter@k3-subctl-m1:~$ sudo kubectl --kubeconfig kubeconfig.cluster-a run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash If you don't see a command prompt, try pressing enter. bash-5.1# ifconfig eth0 Link encap:Ethernet HWaddr DA:E6:09:76:D1:8E inet addr:10.44.1.14 Bcast:10.44.1.255 Mask:255.255.255.0 inet6 addr: fe80::d8e6:9ff:fe76:d18e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:12 errors:0 dropped:0 overruns:0 frame:0 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:984 (984.0 B) TX bytes:628 (628.0 B) bash-5.1# ping 10.144.19 PING 10.144.19 (10.144.0.19) 56(84) bytes of data. ^C --- 10.144.19 ping statistics ---
The ip-address of pod in cluster-b is
10.144.1.9
. Looks like you are trying to ping 10.144.0.19 Are you sure that the ip-address to which you tried to ping belongs to a running pod in cluster-b?
Sorry, there was typo. But, I did multiple times. below is another test.
Pod in cluster b:
walter@k3-subctl-m1:~$ sudo kubectl --kubeconfig kubeconfig.cluster-b run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
[sudo] password for walter:
If you don't see a command prompt, try pressing enter.
bash-5.1# ifconfig
eth0 Link encap:Ethernet HWaddr 2A:32:00:05:EF:C9
inet addr:10.144.1.24 Bcast:10.144.1.255 Mask:255.255.255.0
inet6 addr: fe80::2832:ff:fe05:efc9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:13 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1074 (1.0 KiB) TX bytes:628 (628.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Pod in cluster a:
walter@k3-subctl-m1:~$ sudo kubectl --kubeconfig kubeconfig.cluster-a run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
[sudo] password for walter:
If you don't see a command prompt, try pressing enter.
bash-5.1# ifconfig
eth0 Link encap:Ethernet HWaddr 7A:E6:8F:72:58:A8
inet addr:10.44.1.46 Bcast:10.44.1.255 Mask:255.255.255.0
inet6 addr: fe80::78e6:8fff:fe72:58a8/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:15 errors:0 dropped:0 overruns:0 frame:0
TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1214 (1.1 KiB) TX bytes:768 (768.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Cannot ping from cluster a to cluster b:
bash-5.1# ping 10.144.1.24
PING 10.144.1.24 (10.144.1.24) 56(84) bytes of data.
--- 10.144.1.24 ping statistics ---
8 packets transmitted, 0 received, 100% packet loss, time 7148ms
Looking at the logs, yes you seem to be using flannel CNI. So, i think there is some issue (non-fatal) in the submariner-operator code which is unable to detect flannel CNI in K3s. Can you please report an issue in the "submariner-operator" namespace?
Thanks. I can report an issue in the "submariner-operator" namespace. For now, can advise how to workaround?
Cannot ping from cluster a to cluster b:
bash-5.1# ping 10.144.1.24 PING 10.144.1.24 (10.144.1.24) 56(84) bytes of data. --- 10.144.1.24 ping statistics --- 8 packets transmitted, 0 received, 100% packet loss, time 7148ms
While the ping is in progress, can you run "tcpdump -len -i any" in the netshoot pod of cluster-b to check if the packets are received on the destination cluster or if the icmp ping itself is not reaching the pod in cluster-b.
Cannot ping from cluster a to cluster b:
bash-5.1# ping 10.144.1.24 PING 10.144.1.24 (10.144.1.24) 56(84) bytes of data. --- 10.144.1.24 ping statistics --- 8 packets transmitted, 0 received, 100% packet loss, time 7148ms
While the ping is in progress, can you run "tcpcpdudump -len -i any" in the netshoot pod of cluster-b to check if the packets are received on the destination cluster or if the icmp ping itself is not reaching the pod in cluster-b.
Looking at the logs, yes you seem to be using flannel CNI. So, i think there is some issue (non-fatal) in the submariner-operator code which is unable to detect flannel CNI in K3s. Can you please report an issue in the "submariner-operator" namespace?
Thanks. I can report an issue in the "submariner-operator" namespace. For now, can advise how to workaround?
For flannel even the generic
route-agent would work, hence I mentioned that its "non-fatal" in my message.
Looking at the logs of Submariner pods, I could not find any errors in it. So we have to debug this problem using tcpdump to figure out the issue.
One more observation from the subctl gather output logs:
Normally when subctl gather ...
is run on an cluster, it will also collect logs related to ipsec-status
, ip-xfrm-state
etc as shown below.
I'm not able to find these logs in the subctl gather output that is shared. Did you get any errors while running "subctl gather ... " command?
Cannot ping from cluster a to cluster b:
bash-5.1# ping 10.144.1.24 PING 10.144.1.24 (10.144.1.24) 56(84) bytes of data. --- 10.144.1.24 ping statistics --- 8 packets transmitted, 0 received, 100% packet loss, time 7148ms
While the ping is in progress, can you run "tcpcpdudump -len -i any" in the netshoot pod of cluster-b to check if the packets are received on the destination cluster or if the icmp ping itself is not reaching the pod in cluster-b.
I actually keep watching the RX count in pod of cluster-b. I don't think packets are being received in cluster b.
bash-5.1# ifconfig
eth0 Link encap:Ethernet HWaddr D2:D3:32:5C:81:F6
inet addr:10.144.1.25 Bcast:10.144.1.255 Mask:255.255.255.0
inet6 addr: fe80::d0d3:32ff:fe5c:81f6/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:**18** errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1424 (1.3 KiB) TX bytes:908 (908.0 B)
bash-5.1# ifconfig
eth0 Link encap:Ethernet HWaddr D2:D3:32:5C:81:F6
inet addr:10.144.1.25 Bcast:10.144.1.255 Mask:255.255.255.0
inet6 addr: fe80::d0d3:32ff:fe5c:81f6/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:**18** errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1424 (1.3 KiB) TX bytes:908 (908.0 B)
bash-5.1# ifconfig
eth0 Link encap:Ethernet HWaddr D2:D3:32:5C:81:F6
inet addr:10.144.1.25 Bcast:10.144.1.255 Mask:255.255.255.0
inet6 addr: fe80::d0d3:32ff:fe5c:81f6/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:**18** errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1424 (1.3 KiB) TX bytes:908 (908.0 B)
Looking at the logs, yes you seem to be using flannel CNI. So, i think there is some issue (non-fatal) in the submariner-operator code which is unable to detect flannel CNI in K3s. Can you please report an issue in the "submariner-operator" namespace?
Thanks. I can report an issue in the "submariner-operator" namespace. For now, can advise how to workaround?
For flannel even the
generic
route-agent would work, hence I mentioned that its "non-fatal" in my message. Looking at the logs of Submariner pods, I could not find any errors in it. So we have to debug this problem using tcpdump to figure out the issue.One more observation from the subctl gather output logs: Normally when
subctl gather ...
is run on an cluster, it will also collect logs related toipsec-status
,ip-xfrm-state
etc as shown below.I'm not able to find these logs in the subctl gather output that is shared. Did you get any errors while running "subctl gather ... " command?
I use below syntax to generate the log. I don't get any errors at all.
subctl gather --kubeconfig kubeconfig.cluster-a
Cannot ping from cluster a to cluster b:
bash-5.1# ping 10.144.1.24 PING 10.144.1.24 (10.144.1.24) 56(84) bytes of data. --- 10.144.1.24 ping statistics --- 8 packets transmitted, 0 received, 100% packet loss, time 7148ms
While the ping is in progress, can you run "tcpcpdudump -len -i any" in the netshoot pod of cluster-b to check if the packets are received on the destination cluster or if the icmp ping itself is not reaching the pod in cluster-b.
I actually keep watching the RX count in pod of cluster-b. I don't think packets are being received in cluster b.
bash-5.1# ifconfig eth0 Link encap:Ethernet HWaddr D2:D3:32:5C:81:F6 inet addr:10.144.1.25 Bcast:10.144.1.255 Mask:255.255.255.0 inet6 addr: fe80::d0d3:32ff:fe5c:81f6/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:**18** errors:0 dropped:0 overruns:0 frame:0 TX packets:12 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1424 (1.3 KiB) TX bytes:908 (908.0 B) bash-5.1# ifconfig eth0 Link encap:Ethernet HWaddr D2:D3:32:5C:81:F6 inet addr:10.144.1.25 Bcast:10.144.1.255 Mask:255.255.255.0 inet6 addr: fe80::d0d3:32ff:fe5c:81f6/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:**18** errors:0 dropped:0 overruns:0 frame:0 TX packets:12 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1424 (1.3 KiB) TX bytes:908 (908.0 B) bash-5.1# ifconfig eth0 Link encap:Ethernet HWaddr D2:D3:32:5C:81:F6 inet addr:10.144.1.25 Bcast:10.144.1.255 Mask:255.255.255.0 inet6 addr: fe80::d0d3:32ff:fe5c:81f6/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:**18** errors:0 dropped:0 overruns:0 frame:0 TX packets:12 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1424 (1.3 KiB) TX bytes:908 (908.0 B)
Confirmed with TCPDUMP. Not getting anything.
I use below syntax to generate the log. I don't get any errors at all.
subctl gather --kubeconfig kubeconfig.cluster-a
@vthapar FYI
Confirmed with TCPDUMP. Not getting anything.
Okay, so it confirms that packets are not even reaching the destination cluster and are probably getting dropped on the source Gateway node itself. Please try to check where its getting dropped on the gateway node.
I'm a bit busy this week and other Submariner developers are off due to holidays. If you are unable to figure out the issue by next week, I can provide some support in investigating it.
Confirmed with TCPDUMP. Not getting anything.
Okay, so it confirms that packets are not even reaching the destination cluster and are probably getting dropped on the source Gateway node itself. Please try to check where its getting dropped on the gateway node.
I'm a bit busy this week and other Submariner developers are off due to holidays. If you are unable to figure out the issue by next week, I can provide some support in investigating it.
Actually, thank you very much for help. Found some interesting thing. the traffic goes internet. don't know who is 10.44.1.1 as well
walter@k3-subctl-m1:~$ sudo kubectl --kubeconfig kubeconfig.cluster-a run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
[sudo] password for walter:
If you don't see a command prompt, try pressing enter.
bash-5.1# ping 10.144.1.27
PING 10.144.1.27 (10.144.1.27) 56(84) bytes of data.
^C
--- 10.144.1.27 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
bash-5.1# traceroute 10.144.1.27
traceroute to 10.144.1.27 (10.144.1.27), 30 hops max, 46 byte packets
1 10.44.1.1 (10.44.1.1) 0.005 ms 0.004 ms 0.003 ms
2 gateway.home.local (192.168.1.5) 0.631 ms 0.294 ms 0.343 ms
3 183.90.58.1 (183.90.58.1) 2.483 ms 2.190 ms 2.368 ms
4 183.90.44.101 (183.90.44.101) 2.295 ms 2.368 ms 2.339 ms
5 * * *
6 *^C
bash-5.1#
I think the problem is because the cluster_cidr configured in Submariner is not matching with the actual value of your Cluster.
For example, from the subctl show output ...
, I can see the following
For Cluster A:
Discovered network details via Submariner:
Network plugin: generic
Service CIDRs: [10.45.0.0/16]
Cluster CIDRs: [10.44.0.0/24] <---- This is the issue
For Cluster B:
Discovered network details via Submariner:
Network plugin: generic
Service CIDRs: [10.145.0.0/16]
Cluster CIDRs: [10.144.0.0/24] <---- This is the issue
So, cluster-b is advertising its local clusterCIDR as 10.144.0.0/24
(i.e., IPs falling in this CIDR are 10.144.0.1 to 10.144.0.255). But when you scheduled a pod in Cluster-b its getting an IPaddress like 10.144.1.x
, hence its not going over the Submariner IPsec tunnel. Similar thing is happening for cluster-a.
Did you explicitly specify the --clustercidr
during the subctl join ...
operation or was it auto-discovered by "submariner-operator"?
In case it was auto-discovered, it means its a bug in submariner-operator autodiscovery code. You can do ONE of the following until it's fixed in submariner-operator code.
clusterCIDR
to point to the correct CIDR in both clusters.I think the problem is because the cluster_cidr configured in Submariner is not matching with the actual value of your Cluster. For example, from the
subctl show output ...
, I can see the followingFor Cluster A: Discovered network details via Submariner: Network plugin: generic Service CIDRs: [10.45.0.0/16] Cluster CIDRs: [10.44.0.0/24] <---- This is the issue For Cluster B: Discovered network details via Submariner: Network plugin: generic Service CIDRs: [10.145.0.0/16] Cluster CIDRs: [10.144.0.0/24] <---- This is the issue
So, cluster-b is advertising its local clusterCIDR as
10.144.0.0/24
(i.e., IPs falling in this CIDR are 10.144.0.1 to 10.144.0.255). But when you scheduled a pod in Cluster-b its getting an IPaddress like10.144.1.x
, hence its not going over the Submariner IPsec tunnel. Similar thing is happening for cluster-a.Did you explicitly specify the
--clustercidr
during thesubctl join ...
operation or was it auto-discovered by "submariner-operator"?In case it was auto-discovered, it means its a bug in submariner-operator autodiscovery code. You can do ONE of the following until it's fixed in submariner-operator code.
- Re-run "subctl join --clustercidr
... ` on both the clusters - Modify submariner crd on both the clusters and edit the value of
clusterCIDR
to point to the correct CIDR in both clusters.
Thanks. You are right. After I use method 1 you proposed, it is pingable now. Very good.
Just FYI. I still need to use --health-check=false. if I remove --health-check=false, there is error in subctl show output.
walter@k3-subctl-m1:~$ subctl show all --kubeconfig kubeconfig.cluster-a
Cluster "default"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
k3-subctl-w2 cluster-b 192.168.1.93 no libreswan 10.145.0.0/16, 10.144.0.0/24 error 0s
✓ Showing Endpoints
CLUSTER ID ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE
cluster-a 192.168.1.92 58.182.182.154 libreswan local
cluster-b 192.168.1.93 58.182.182.154 libreswan remote
✓ Showing Gateways
NODE HA STATUS SUMMARY
k3-subctl-w1 active 0 connections out of 1 are established
Discovered network details via Submariner:
Network plugin: generic
Service CIDRs: [10.45.0.0/16]
Cluster CIDRs: [10.44.0.0/16]
✓ Showing Network details
COMPONENT REPOSITORY VERSION
submariner quay.io/submariner 0.11.0
submariner-operator quay.io/submariner 0.11.0
service-discovery quay.io/submariner 0.11.0
✓ Showing versions
### This is output if I use --health-check=false
walter@k3-subctl-m1:~$ subctl show all --kubeconfig kubeconfig.cluster-a
Cluster "default"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
k3-subctl-w2 cluster-b 192.168.1.93 no libreswan 10.145.0.0/16, 10.144.0.0/16 connected
✓ Showing Endpoints
CLUSTER ID ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE
cluster-a 192.168.1.92 58.182.182.154 libreswan local
cluster-b 192.168.1.93 58.182.182.154 libreswan remote
✓ Showing Gateways
NODE HA STATUS SUMMARY
k3-subctl-w1 active All connections (1) are established
Discovered network details via Submariner:
Network plugin: generic
Service CIDRs: [10.45.0.0/16]
Cluster CIDRs: [10.44.0.0/16]
✓ Showing Network details
COMPONENT REPOSITORY VERSION
submariner quay.io/submariner 0.11.0
submariner-operator quay.io/submariner 0.11.0
service-discovery quay.io/submariner 0.11.0
✓ Showing versions
walter@k3-subctl-m1:~$ subctl show all --kubeconfig kubeconfig.cluster-b
Cluster "default"
✓ Showing Connections
GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg.
k3-subctl-w1 cluster-a 192.168.1.92 no libreswan 10.45.0.0/16, 10.44.0.0/16 connected
✓ Showing Endpoints
CLUSTER ID ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE
cluster-b 192.168.1.93 58.182.182.154 libreswan local
cluster-a 192.168.1.92 58.182.182.154 libreswan remote
✓ Showing Gateways
NODE HA STATUS SUMMARY
k3-subctl-w2 active All connections (1) are established
Discovered network details via Submariner:
Network plugin: generic
Service CIDRs: [10.145.0.0/16]
Cluster CIDRs: [10.144.0.0/16]
✓ Showing Network details
COMPONENT REPOSITORY VERSION
submariner quay.io/submariner 0.11.0
submariner-operator quay.io/submariner 0.11.0
service-discovery quay.io/submariner 0.11.0
✓ Showing versions
walter@k3-subctl-m1:~$
walter@k3-subctl-m1:~$
I think the problem is because the cluster_cidr configured in Submariner is not matching with the actual value of your Cluster. For example, from the
subctl show output ...
, I can see the followingFor Cluster A: Discovered network details via Submariner: Network plugin: generic Service CIDRs: [10.45.0.0/16] Cluster CIDRs: [10.44.0.0/24] <---- This is the issue For Cluster B: Discovered network details via Submariner: Network plugin: generic Service CIDRs: [10.145.0.0/16] Cluster CIDRs: [10.144.0.0/24] <---- This is the issue
So, cluster-b is advertising its local clusterCIDR as
10.144.0.0/24
(i.e., IPs falling in this CIDR are 10.144.0.1 to 10.144.0.255). But when you scheduled a pod in Cluster-b its getting an IPaddress like10.144.1.x
, hence its not going over the Submariner IPsec tunnel. Similar thing is happening for cluster-a. Did you explicitly specify the--clustercidr
during thesubctl join ...
operation or was it auto-discovered by "submariner-operator"? In case it was auto-discovered, it means its a bug in submariner-operator autodiscovery code. You can do ONE of the following until it's fixed in submariner-operator code.
- Re-run "subctl join --clustercidr ... ` on both the clusters
- Modify submariner crd on both the clusters and edit the value of
clusterCIDR
to point to the correct CIDR in both clusters.Thanks. You are right. After I use method 1 you proposed, it is pingable now. Very good.
Just FYI. I still need to use --health-check=false. if I remove --health-check=false, there is error in subctl show output.
walter@k3-subctl-m1:~$ subctl show all --kubeconfig kubeconfig.cluster-a Cluster "default" ✓ Showing Connections GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg. k3-subctl-w2 cluster-b 192.168.1.93 no libreswan 10.145.0.0/16, 10.144.0.0/24 error 0s ✓ Showing Endpoints CLUSTER ID ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE cluster-a 192.168.1.92 58.182.182.154 libreswan local cluster-b 192.168.1.93 58.182.182.154 libreswan remote ✓ Showing Gateways NODE HA STATUS SUMMARY k3-subctl-w1 active 0 connections out of 1 are established Discovered network details via Submariner: Network plugin: generic Service CIDRs: [10.45.0.0/16] Cluster CIDRs: [10.44.0.0/16] ✓ Showing Network details COMPONENT REPOSITORY VERSION submariner quay.io/submariner 0.11.0 submariner-operator quay.io/submariner 0.11.0 service-discovery quay.io/submariner 0.11.0 ✓ Showing versions
Suggest to update https://submariner.io/getting-started/quickstart/k3s/ URL and put cluster CIDR option in the join command.
Thank you, @sridhargaddam. By going thru this troubleshooting session, I learnt a lot on submariner and Kubernetes.
Suggest to update https://submariner.io/getting-started/quickstart/k3s/ URL and put cluster CIDR option in the join command.
Glad to hear that. We welcome PRs :) In-case you want to propose a PR, please feel free to send a PR to the following repo - https://github.com/submariner-io/submariner-website
Also, can you please report a bug on "submariner-operator" repo with the following info, thanks.
Did you explicitly specify the
--clustercidr
during thesubctl join ...
operation or was it auto-discovered by "submariner-operator"?Glad to hear that its working now. May I know if the original issue was a wrong configuration or if its an issue with submariner-operator autodiscovery code?
Thank you, @sridhargaddam. By going thru this troubleshooting session, I learnt a lot on submariner and Kubernetes.
Suggest to update https://submariner.io/getting-started/quickstart/k3s/ URL and put cluster CIDR option in the join command.
Glad to hear that. We welcome PRs :) In-case you want to propose a PR, please feel free to send a PR to the following repo - https://github.com/submariner-io/submariner-website
Also, can you please report a bug on "submariner-operator" repo with the following info, thanks.
- Auto-discovery of CNI is failing on K3s
- Issue with discovery of ClusterCIDRs on K3s
Will report both issues. I will take a look how to do the PR as well.
Thanks @cwalterhk for the feedback and @sridhargaddam for your support!
Hi, I follow the instruction in submariner.io to test. I cannot establish connectivity between two K3S clusters. when I run the connectivity verification. It gives me below output (output-1).
output-1.txt
When I use the subctl show command as below, I got below output (output-2).
output -2.txt
below is my get node output (output-3).
output-3.txt