Closed ishantanu closed 4 years ago
That message is actually ok, is Submariner functional? Next release will include logic so that it won't try to constantly add a new route.
Nope, it is not. I tried creating an nginx pod in one cluster and pinged the pod IP from another cluster. It did not work.
Did you disable strict source/destination checking on the AWS nodes?
Yes, I did.
Just for the info, here are the details of three clusters.
Bare-metal cluster (acting as broker) created via RKE CLI:
domain: cluster.local
cluster CIDR: 10.40.0.0/16
service CIDR: 10.41.0.0/16
Bare-metal cluster created from Rancher GUI:
domain: xyz.cluster.local
cluster CIDR: 10.53.0.0/16
service CIDR: 10.54.0.0/16
AWS cluster created from Rancher GUI:
domain: aws.cluster.local
cluster CIDR: 10.51.0.0/16
service CIDR: 10.52.0.0/16
Networking plugin used on all clusters: Flannel.
OK. I have before had to reboot my nodes during an HA failover that was unsuccessful, this is almost what this feels like here; is it possible for you to reboot your worker nodes? I know that's a bit of a big hammer for the situation, but should help. There is an issue with updating the iptables/ipsec rules that still isn't fully debugged at this point.
Well, I already did a reboot to all the worker nodes of a non-broker bare-metal cluster and it did not work at that time. But, let me try it one more time.
A reboot to the worker nodes is done as well. It still does not work.
on the elected gateway nodes, can you perform an ip xfrm state
to see if you have established IPsec tunnels?
It returns nothing:
ubuntu@gateway:~$ sudo ip xfrm state
ubuntu@gateway:~$
This is indicative of the ipsec tunnels not being established properly; are you able to restart the submariner gateway pod and take a look at the logs to see if the messages from StrongSwan related to establishing the tunnel?
Okay. So, I restarted the gateway pods on both cluster and the logs mentioned messages from StrongSwan with errors:
00[DMN] Starting IKE charon daemon (strongSwan 5.5.1, Linux 4.15.0-47-generic, x86_64)
00[KNL] unable to create IPv4 routing table rule
00[KNL] unable to create IPv6 routing table rule
00[CFG] loading ca certificates from '/usr/local/etc/ipsec.d/cacerts'
00[LIB] opening directory '/usr/local/etc/ipsec.d/cacerts' failed: No such file or directory
00[CFG] reading directory failed
00[CFG] loading aa certificates from '/usr/local/etc/ipsec.d/aacerts'
00[LIB] opening directory '/usr/local/etc/ipsec.d/aacerts' failed: No such file or directory
00[CFG] reading directory failed
00[CFG] loading ocsp signer certificates from '/usr/local/etc/ipsec.d/ocspcerts'
00[LIB] opening directory '/usr/local/etc/ipsec.d/ocspcerts' failed: No such file or directory
00[CFG] reading directory failed
00[CFG] loading attribute certificates from '/usr/local/etc/ipsec.d/acerts'
00[LIB] opening directory '/usr/local/etc/ipsec.d/acerts' failed: No such file or directory
00[CFG] reading directory failed
00[CFG] loading crls from '/usr/local/etc/ipsec.d/crls'
00[LIB] opening directory '/usr/local/etc/ipsec.d/crls' failed: No such file or directory
00[CFG] reading directory failed
@Oats87 Any idea what might be wrong? I tried recreating the clusters with new instances and somehow, the error stays the same.
This issue has been automatically marked as stale because it has not had activity for 120 days. It will be closed if no further activity occurs. Please make a comment if this issue/pr is still valid. Thank you for your contributions.
Have you ever resolve the issue? I am facing the same one.
Hi,
Following is the setup with which I am trying submariner:
So, the thing is:
After that, both the submariner route pods (on both the clusters) are showing this error:
So, I deleted the complete submariner installation on all clusters and retried it. But, I still see the same error.
Where are these thing stored? How do I make Submariner work again?
UPDATE: I tried the whole sequence one more time by deleting helm releases and recreating everything. It still shows the same error.
I even recreated the clusters with different CIDR's and they all still show the same error message. Only the CIDR values get changed.