submariner-io / submariner-operator

Operator that deploys the various Submariner components.
Apache License 2.0
106 stars 67 forks source link

Network tunnel does not come up #1023

Closed nyechiel closed 3 years ago

nyechiel commented 3 years ago

Originally reported by @raffaelespazzoli

Deployment of Submariner 0.8 with OCP 4.6.4 on AWS. Tunnels are not coming up and gateway log shows:

I0111 16:13:14.748922       1 libreswan.go:181] Connection "submariner-cable-cluster2-10-0-13-107-2-1" not found in active connections obtained from whack: map[submariner-cable-cluster3-10-0-80-189-2-2:0], map[submariner-cable-cluster3-10-0-80-189-2-2:241296]
I0111 16:13:14.748935       1 libreswan.go:181] Connection "submariner-cable-cluster2-10-0-13-107-2-2" not found in active connections obtained from whack: map[submariner-cable-cluster3-10-0-80-189-2-2:0], map[submariner-cable-cluster3-10-0-80-189-2-2:241296]
I0111 16:13:14.748967       1 libreswan.go:195] Connection "submariner-cable-cluster2-10-0-13-107" not found in active connections obtained from whack: map[submariner-cable-cluster3-10-0-80-189-2-2:0], map[submariner-cable-cluster3-10-0-80-189-2-2:241296]
I0111 16:13:19.577774       1 pinger.go:142] Pinger for IP "10.134.2.1" stopped
I0111 16:13:19.577799       1 pinger.go:87] Starting pinger for IP "10.134.2.1"
I0111 16:13:19.577779       1 pinger.go:142] Pinger for IP "10.138.2.1" stopped

How to reproduce it (as minimally and precisely as possible):

  1. Run the prep_for_subm.sh script to add the gateway node
  2. Deploy Submariner
  3. Discovered an issue on one of the gateway VMs -- the machineset and machine were defined, but the VM could not be created because the instance type did not exist in that AZ
  4. Recreated the VM fixing the issue
  5. The tunnel was not coming up

Reinstalling Submariner fixed the issue.

Environment:

Showing information for cluster "cluster1":
    Discovered network details:
        Network plugin:  OpenShiftSDN
        Service CIDRs:   [172.30.0.0/16]
        Cluster CIDRs:   [10.128.0.0/14]

CLUSTER ID                    ENDPOINT IP     PUBLIC IP       CABLE DRIVER        TYPE            
cluster1                      10.0.80.104     3.236.171.237   libreswan           local           
cluster3                      10.0.80.189     34.210.38.246   libreswan           remote          
cluster2                      10.0.13.107     3.17.79.238     libreswan           remote          

GATEWAY                         CLUSTER                 REMOTE IP       CABLE DRIVER        SUBNETS                                 STATUS          
ip-10-0-80-189                  cluster3                10.0.80.189     libreswan           172.32.0.0/16, 10.136.0.0/14            error           
ip-10-0-13-107                  cluster2                10.0.13.107     libreswan           172.31.0.0/16, 10.132.0.0/14            connecting      

NODE                            HA STATUS       SUMMARY                         
ip-10-0-80-104                  active          0 connections out of 2 are established

COMPONENT                       REPOSITORY                                            VERSION         
submariner                      quay.io/submariner                                    0.8.0           
submariner-operator             quay.io/submariner                                    0.8.0           
service-discovery               quay.io/submariner                                    0.8.0           

Showing information for cluster "cluster2":
    Discovered network details:
        Network plugin:  OpenShiftSDN
        Service CIDRs:   [172.31.0.0/16]
        Cluster CIDRs:   [10.132.0.0/14]

CLUSTER ID                    ENDPOINT IP     PUBLIC IP       CABLE DRIVER        TYPE            
cluster2                      10.0.75.80      3.19.208.91     libreswan           local           
cluster1                      10.0.80.104     3.236.171.237   libreswan           remote          
cluster3                      10.0.40.14      34.222.118.38   libreswan           remote          

GATEWAY                         CLUSTER                 REMOTE IP       CABLE DRIVER        SUBNETS                                 STATUS          
ip-10-0-80-104                  cluster1                10.0.80.104     libreswan           172.30.0.0/16, 10.128.0.0/14            connecting      
ip-10-0-40-14                   cluster3                10.0.40.14      libreswan           172.32.0.0/16, 10.136.0.0/14            connecting      

NODE                            HA STATUS       SUMMARY                         
ip-10-0-75-80                   active          0 connections out of 2 are established

COMPONENT                       REPOSITORY                                            VERSION         
submariner                      quay.io/submariner                                    0.8.0           
submariner-operator             quay.io/submariner                                    0.8.0           
service-discovery               quay.io/submariner                                    0.8.0           

Showing information for cluster "cluster3":
    Discovered network details:
        Network plugin:  OpenShiftSDN
        Service CIDRs:   [172.32.0.0/16]
        Cluster CIDRs:   [10.136.0.0/14]

CLUSTER ID                    ENDPOINT IP     PUBLIC IP       CABLE DRIVER        TYPE            
cluster3                      10.0.80.189     34.210.38.246   libreswan           local           
cluster1                      10.0.94.244     18.205.60.181   libreswan           remote          
cluster2                      10.0.13.107     3.17.79.238     libreswan           remote          

GATEWAY                         CLUSTER                 REMOTE IP       CABLE DRIVER        SUBNETS                                 STATUS          
ip-10-0-94-244                  cluster1                10.0.94.244     libreswan           172.30.0.0/16, 10.128.0.0/14            connecting      
ip-10-0-13-107                  cluster2                10.0.13.107     libreswan           172.31.0.0/16, 10.132.0.0/14            connecting      

NODE                            HA STATUS       SUMMARY                         
ip-10-0-80-189                  active          0 connections out of 2 are established

COMPONENT                       REPOSITORY                                            VERSION         
submariner                      quay.io/submariner                                    0.8.0           
submariner-operator             quay.io/submariner                                    0.8.0           
service-discovery               quay.io/submariner                                    0.8.0 
nyechiel commented 3 years ago

@skitt have you had a chance to look into this issue? I am adding it to the 0.9 board for now so that we can review it as part of planning.

skitt commented 3 years ago

@skitt have you had a chance to look into this issue? I am adding it to the 0.9 board for now so that we can review it as part of planning.

Yes, nothing to report yet.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had activity for 60 days. It will be closed if no further activity occurs. Please make a comment if this issue/pr is still valid. Thank you for your contributions.