submariner-io / submariner

Networking component for interconnecting Pods and Services across Kubernetes clusters.
https://submariner.io
Apache License 2.0
2.42k stars 190 forks source link

Gateway status error: No IKE SA found for cable submariner #759

Closed manosnoam closed 4 years ago

manosnoam commented 4 years ago

Installing Submariner completed, but looking at Cluster B (OSP) gateway I see: Gateway Status: error Status Message: No IKE SA found for cable submariner-cable-nmanos-cluster-a-10-166-15-143

Full test report: https://qe-jenkins-csb-skynet.cloud.paas.psi.redhat.com/job/Submariner-OSP-AWS/786/Test-Report/

Errors I see:

$ oc describe Gateway -n submariner-operator

gateway_info="Name:         default-cl1-ff6px-worker-kjg2n
Namespace:    submariner-operator
Labels:       <none>
Annotations:  update-timestamp: 1597728267
API Version:  submariner.io/v1
Kind:         Gateway
Metadata:
  Creation Timestamp:  2020-08-18T05:25:21Z
  Generation:          3
  Resource Version:    5391742
  Self Link:           /apis/submariner.io/v1/namespaces/submariner-operator/gateways/default-cl1-ff6px-worker-kjg2n
  UID:                 4e36357d-f08a-432c-af45-1a35fc2e223a
Status:
  Connections:
    Endpoint:
      Backend:      strongswan
      cable_name:   submariner-cable-nmanos-cluster-a-10-166-15-143
      cluster_id:   nmanos-cluster-a
      Hostname:     ip-10-166-15-143
      nat_enabled:  true
      private_ip:   10.166.15.143
      public_ip:    18.216.200.184
      Subnets:
        169.254.0.0/19
    Status:          error
    Status Message:  No IKE SA found for cable submariner-cable-nmanos-cluster-a-10-166-15-143
  Ha Status:         active
  Local Endpoint:
    Backend:      strongswan
    cable_name:   submariner-cable-nmanos-cluster-b-10-166-0-205
    cluster_id:   nmanos-cluster-b
    Hostname:     default-cl1-ff6px-worker-kjg2n
    nat_enabled:  true
    private_ip:   10.166.0.205
    public_ip:    66.187.233.202
    Subnets:
      169.254.32.0/19
  Status Failure:  
  Version:         v0.5.0-12-gda2b27f

$ subctl show all
GATEWAY             CLUSTER         REMOTE IP       CABLE DRIVER            SUBNETS                                 STATUS          
ip-10-166-15-143    nmanos-cluster-a10.166.15.143   strongswan              169.254.0.0/19                          error           
NODE                HA STATUS       SUMMARY                         
default-cl1-ff6px-worker-kjg2nactive          0 connections out of 1 are established

# Pod submariner-operator-cfc8f4555-hv4rz in Namespace submariner-operator

{"level":"error","ts":1597728307.4733005,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"submariner-controller","name":"submariner","namespace":"submariner-operator","error":"error creating or updating DaemonSet submariner-operator/submariner-globalnet: DaemonSet.apps \"submariner-globalnet\" not found","errorVerbose":"DaemonSet.apps \"submariner-globalnet\" not found\nerror creating or updating DaemonSet submariner-operator/submariner-globalnet","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tsubmariner-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:248\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:201\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1597728308.475035,"logger":"controller_submariner","msg":"Reconciling Submariner","Request.Namespace":"submariner-operator","Request.Name":"submariner"}
{"level":"info","ts":1597728308.4898276,"logger":"controller_submariner","msg":"Using detected CIDR","type":"Cluster","CIDR":"10.252.0.0/14"}
{"level":"info","ts":1597728308.4898536,"logger":"controller_submariner","msg":"Using detected CIDR","type":"Service","CIDR":"100.96.0.0/16"}
{"level":"info","ts":1597728308.5077403,"logger":"controller_submariner","msg":"Updated existing DaemonSet","Request.Namespace":"submariner-operator","Request.Name":"submariner","DaemonSet.Namespace":"submariner-operator","DaemonSet.Name":"submariner-gateway"}
{"level":"info","ts":1597728308.51728,"logger":"controller_submariner","msg":"Updated existing DaemonSet","Request.Namespace":"submariner-operator","Request.Name":"submariner","DaemonSet.Namespace":"submariner-operator","DaemonSet.Name":"submariner-routeagent"}
{"level":"info","ts":1597728308.525415,"logger":"controller_submariner","msg":"Updated existing DaemonSet","Request.Namespace":"submariner-operator","Request.Name":"submariner","DaemonSet.Namespace":"submariner-operator","DaemonSet.Name":"submariner-globalnet"}
{"level":"info","ts":1597728308.5622764,"logger":"controller_servicediscovery","msg":"Reconciling ServiceDiscovery","Request.Namespace":"submariner-operator","Request.Name":"service-discovery"}
{"level":"error","ts":1597728308.5760353,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"submariner-controller","name":"submariner","namespace":"submariner-operator","error":"error reconciling the Service Discovery CR: resourceVersion should not be set on objects to be created","errorVerbose":"resourceVersion should not be set on objects to be created\nerror reconciling the Service Discovery CR","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tsubmariner-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:248\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:201\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1597728308.6201267,"logger":"controller_servicediscovery","msg":"Created a new Deployment","Request.Namespace":"submariner-operator","Request.Name":"service-discovery","Deployment.Namespace":"submariner-operator","Deployment.Name":"submariner-lighthouse-agent"}
{"level":"error","ts":1597728308.620211,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"servicediscovery-controller","name":"service-discovery","namespace":"submariner-operator","error":"error creating or updating Deployment submariner-operator/submariner-lighthouse-agent: Deployment.apps \"submariner-lighthouse-agent\" not found","errorVerbose":"Deployment.apps \"submariner-lighthouse-agent\" not found\nerror creating or updating Deployment submariner-operator/submariner-lighthouse-agent","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tsubmariner-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:248\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:201\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1597728309.5770357,"logger":"controller_submariner","msg":"Reconciling Submariner","Request.Namespace":"submariner-operator","Request.Name":"submariner"}
{"level":"info","ts":1597728309.5926404,"logger":"controller_submariner","msg":"Using detected CIDR","type":"Cluster","CIDR":"10.252.0.0/14"}
{"level":"info","ts":1597728309.592678,"logger":"controller_submariner","msg":"Using detected CIDR","type":"Service","CIDR":"100.96.0.0/16"}
{"level":"info","ts":1597728309.604667,"logger":"controller_submariner","msg":"Updated existing DaemonSet","Request.Namespace":"submariner-operator","Request.Name":"submariner","DaemonSet.Namespace":"submariner-operator","DaemonSet.Name":"submariner-gateway"}
{"level":"info","ts":1597728309.6143055,"logger":"controller_submariner","msg":"Updated existing DaemonSet","Request.Namespace":"submariner-operator","Request.Name":"submariner","DaemonSet.Namespace":"submariner-operator","DaemonSet.Name":"submariner-routeagent"}
{"level":"info","ts":1597728309.620413,"logger":"controller_servicediscovery","msg":"Reconciling ServiceDiscovery","Request.Namespace":"submariner-operator","Request.Name":"service-discovery"}
{"level":"info","ts":1597728309.6307642,"logger":"controller_servicediscovery","msg":"Updated existing Deployment","Request.Namespace":"submariner-operator","Request.Name":"service-discovery","Deployment.Namespace":"submariner-operator","Deployment.Name":"submariner-lighthouse-agent"}
{"level":"info","ts":1597728309.6312509,"logger":"controller_submariner","msg":"Updated existing DaemonSet","Request.Namespace":"submariner-operator","Request.Name":"submariner","DaemonSet.Namespace":"submariner-operator","DaemonSet.Name":"submariner-globalnet"}
{"level":"info","ts":1597728309.7461588,"logger":"controller_servicediscovery","msg":"Created a new ConfigMap","Request.Namespace":"submariner-operator","Request.Name":"service-discovery","ConfigMap.Namespace":"submariner-operator","ConfigMap.Name":"submariner-lighthouse-coredns"}
{"level":"error","ts":1597728309.746205,"logger":"controller_servicediscovery","msg":"Error creating the lighthouseCoreDNS configMap","error":"error creating or updating ConfigMap submariner-operator/submariner-lighthouse-coredns: ConfigMap \"submariner-lighthouse-coredns\" not found","errorVerbose":"ConfigMap \"submariner-lighthouse-coredns\" not found\nerror creating or updating ConfigMap submariner-operator/submariner-lighthouse-coredns","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tsubmariner-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/submariner-io/submariner-operator/pkg/controller/servicediscovery.(*ReconcileServiceDiscovery).Reconcile\n\tsubmariner-operator/pkg/controller/servicediscovery/servicediscovery_controller.go:142\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:246\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:201\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
{"level":"error","ts":1597728309.7463112,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"servicediscovery-controller","name":"service-discovery","namespace":"submariner-operator","error":"error creating or updating ConfigMap submariner-operator/submariner-lighthouse-coredns: ConfigMap \"submariner-lighthouse-coredns\" not found","errorVerbose":"ConfigMap \"submariner-lighthouse-coredns\" not found\nerror creating or updating ConfigMap submariner-operator/submariner-lighthouse-coredns","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tsubmariner-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:248\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tsubmariner-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:201\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\tsubmariner-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

@aswinsuryan and @vthapar confirm that these Lighthouse errors, are beeing handled correctly:

"error":"error creating or updating DaemonSet submariner-operator/submariner-globalnet: DaemonSet.apps \"submariner-globalnet\" not found"

"error":"error creating or updating ConfigMap submariner-operator/submariner-lighthouse-coredns: ConfigMap \"submariner-lighthouse-coredns\" not found"

The lighthouse DNS service is created in both the clusters and the openshift configmap is updated. Some of these errors happens due to a conflict and will resolve in a retry/reconcile.

However, the Strongswan error is still relevant.

Environment: subctl version: v0.5.0-28-g0019750 Cluster A (AWS): OCP Version 4.5.6 Cluster B (OSP): OCP Version 4.4.3

sridhargaddam commented 4 years ago

The reason for this error is because Charon (aka Strongswan process) was not started by the time Gateway object was queried. This is not a problem and is technically possible during the installation phase of Submariner where it takes few seconds for the tunnels to be established between the joining clusters.

From the logs of Gateway Engine, we can see that initially the Gateway object reports error, and once the Charon process is started, the subsequent Gateway objects return Connected.

I0818 06:30:07.366842       1 syncer.go:69] Gateway already exists - updating &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:default-cl1-ff6px-worker-kjg2n GenerateName: Namespace: SelfLink: UID: ResourceVersion: Generation:0 CreationTimestamp:0001-01-01 00:00:00 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[update-timestamp:1597732207] OwnerReferences:[] Finalizers:[] ClusterName: ManagedFields:[]} Status:{Version:v0.5.0-12-gda2b27f HAStatus:active LocalEndpoint:{ClusterID:nmanos-cluster-b CableName:submariner-cable-nmanos-cluster-b-10-166-0-205 Hostname:default-cl1-ff6px-worker-kjg2n Subnets:[169.254.32.0/19] PrivateIP:10.166.0.205 PublicIP:66.187.233.202 NATEnabled:true Backend:strongswan BackendConfig:map[]} StatusFailure: Connections:[{Status:error StatusMessage:No IKE SA found for cable submariner-cable-nmanos-cluster-a-10-166-15-143 Endpoint:{ClusterID:nmanos-cluster-a CableName:submariner-cable-nmanos-cluster-a-10-166-15-143 Hostname:ip-10-166-15-143 Subnets:[169.254.0.0/19] PrivateIP:10.166.15.143 PublicIP:18.216.200.184 NATEnabled:true Backend:strongswan BackendConfig:map[]}}]}} 
I0818 06:30:09.643804       1 datastoresyncer.go:135] Enqueueing endpoint submariner-operator/nmanos-cluster-b-submariner-cable-nmanos-cluster-b-10-166-0-205
I0818 06:30:09.643919       1 datastoresyncer.go:135] Enqueueing endpoint submariner-operator/nmanos-cluster-a-submariner-cable-nmanos-cluster-a-10-166-15-143
I0818 06:30:09.644045       1 datastoresyncer.go:123] Enqueueing cluster submariner-operator/nmanos-cluster-b
I0818 06:30:09.644058       1 datastoresyncer.go:123] Enqueueing cluster submariner-operator/nmanos-cluster-a
I0818 06:30:09.643899       1 tunnel.go:128] Tunnel controller enqueueing Endpoint {"metadata":{"name":"nmanos-cluster-b-submariner-cable-nmanos-cluster-b-10-166-0-205","namespace":"submariner-operator","selfLink":"/apis/submariner.io/v1/namespaces/submariner-operator/endpoints/nmanos-cluster-b-submariner-cable-nmanos-cluster-b-10-166-0-205","uid":"625b51a6-1e8a-45df-a395-53a920f1379b","resourceVersion":"5413706","generation":1,"creationTimestamp":"2020-08-18T06:29:02Z"},"spec":{"cluster_id":"nmanos-cluster-b","cable_name":"submariner-cable-nmanos-cluster-b-10-166-0-205","hostname":"default-cl1-ff6px-worker-kjg2n","subnets":["169.254.32.0/19"],"private_ip":"10.166.0.205","public_ip":"66.187.233.202","nat_enabled":true,"backend":"strongswan"}}
I0818 06:30:09.644132       1 tunnel.go:128] Tunnel controller enqueueing Endpoint {"kind":"Endpoint","apiVersion":"submariner.io/v1","metadata":{"name":"nmanos-cluster-a-submariner-cable-nmanos-cluster-a-10-166-15-143","namespace":"submariner-operator","selfLink":"/apis/submariner.io/v1/namespaces/submariner-operator/endpoints/nmanos-cluster-a-submariner-cable-nmanos-cluster-a-10-166-15-143","uid":"b6ae3148-c89d-4825-ae2a-76dfafffc549","resourceVersion":"5391623","generation":1,"creationTimestamp":"2020-08-18T05:25:27Z"},"spec":{"cluster_id":"nmanos-cluster-a","cable_name":"submariner-cable-nmanos-cluster-a-10-166-15-143","hostname":"ip-10-166-15-143","subnets":["169.254.0.0/19"],"private_ip":"10.166.15.143","public_ip":"18.216.200.184","nat_enabled":true,"backend":"strongswan"}}
I0818 06:30:09.649334       1 datastoresyncer.go:219] Processing local submariner Cluster object: &v1.Cluster{TypeMeta:v1.TypeMeta{Kind:"Cluster", APIVersion:"submariner.io/v1"}, ObjectMeta:v1.ObjectMeta{Name:"nmanos-cluster-b", GenerateName:"", Namespace:"submariner-operator", SelfLink:"/apis/submariner.io/v1/namespaces/submariner-operator/clusters/nmanos-cluster-b", UID:"795747e9-144f-4b69-8024-8d15709f375f", ResourceVersion:"5391535", Generation:1, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63733325121, loc:(*time.Location)(0x2154f60)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v1.ClusterSpec{ClusterID:"nmanos-cluster-b", ColorCodes:[]string{"blue"}, ServiceCIDR:[]string{"100.96.0.0/16"}, ClusterCIDR:[]string{"10.252.0.0/14"}, GlobalCIDR:[]string{"169.254.32.0/19"}}}
I0818 06:30:09.649558       1 kubernetes.go:317] In SetCluster: &types.SubmarinerCluster{ID:"nmanos-cluster-b", Spec:v1.ClusterSpec{ClusterID:"nmanos-cluster-b", ColorCodes:[]string{"blue"}, ServiceCIDR:[]string{"100.96.0.0/16"}, ClusterCIDR:[]string{"10.252.0.0/14"}, GlobalCIDR:[]string{"169.254.32.0/19"}}}

\<SNIP\>

I0818 06:30:09.685609       1 datastoresyncer.go:271] The updated submariner Endpoint "nmanos-cluster-a" is not for this cluster - skipping updating the datastore
00[DMN] Starting IKE charon daemon (strongSwan 5.8.4, Linux 4.18.0-147.8.1.el8_1.x86_64, x86_64)
00[CFG] PKCS11 module '<name>' lacks library path
00[LIB] openssl FIPS mode(2) - enabled 
00[KNL] unable to create IPv4 routing table rule
00[KNL] unable to create IPv6 routing table rule
00[CFG] loading ca certificates from '/etc/strongswan/ipsec.d/cacerts'
00[CFG] loading aa certificates from '/etc/strongswan/ipsec.d/aacerts'
00[CFG] loading ocsp signer certificates from '/etc/strongswan/ipsec.d/ocspcerts'
00[CFG] loading attribute certificates from '/etc/strongswan/ipsec.d/acerts'
00[CFG] loading crls from '/etc/strongswan/ipsec.d/crls'
00[CFG] loading secrets from '/etc/strongswan/ipsec.secrets'
00[CFG] opening triplet file /etc/strongswan/ipsec.d/triplets.dat failed: No such file or directory
00[CFG] loaded 0 RADIUS server configurations
00[CFG] HA config misses local/remote address
00[CFG] no script for ext-auth script defined, disabled
00[LIB] loaded plugins: charon pkcs11 tpm aesni aes des rc2 sha2 sha1 md4 md5 mgf1 random nonce x509 revocation constraints acert pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey pem openssl gcrypt fips-prf gmp curve25519 chapoly xcbc cmac hmac ctr ccm gcm drbg newhope curl attr kernel-netlink resolve socket-default farp stroke vici updown eap-identity eap-sim eap-aka eap-aka-3gpp eap-aka-3gpp2 eap-md5 eap-gtc eap-mschapv2 eap-dynamic eap-radius eap-tls eap-ttls eap-peap xauth-generic xauth-eap xauth-pam xauth-noauth dhcp led duplicheck unity counters
00[JOB] spawning 16 worker threads

\<SNIP\>

I0818 06:30:17.409470       1 syncer.go:171] Generated Gateway object: {TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:default-cl1-ff6px-worker-kjg2n GenerateName: Namespace: SelfLink: UID: ResourceVersion: Generation:0 CreationTimestamp:0001-01-01 00:00:00 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[update-timestamp:1597732217] OwnerReferences:[] Finalizers:[] ClusterName: ManagedFields:[]} Status:{Version:v0.5.0-12-gda2b27f HAStatus:active LocalEndpoint:{ClusterID:nmanos-cluster-b CableName:submariner-cable-nmanos-cluster-b-10-166-0-205 Hostname:default-cl1-ff6px-worker-kjg2n Subnets:[169.254.32.0/19] PrivateIP:10.166.0.205 PublicIP:66.187.233.202 NATEnabled:true Backend:strongswan BackendConfig:map[]} StatusFailure: Connections:[{Status:connected StatusMessage:Connected to 18.216.200.184:4501 - encryption alg=AES_GCM_16, keysize=128 rekey-time=13409 Endpoint:{ClusterID:nmanos-cluster-a CableName:submariner-cable-nmanos-cluster-a-10-166-15-143 Hostname:ip-10-166-15-143 Subnets:[169.254.0.0/19] PrivateIP:10.166.15.143 PublicIP:18.216.200.184 NATEnabled:true Backend:strongswan BackendConfig:map[]}}]}}
I0818 06:30:18.412254       1 syncer.go:140] Last synced Gateway: &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:default-cl1-ff6px-worker-kjg2n GenerateName: Namespace:submariner-operator SelfLink:/apis/submariner.io/v1/namespaces/submariner-operator/gateways/default-cl1-ff6px-worker-kjg2n UID:9697d09d-60e7-40b8-9d79-4ff629589447 ResourceVersion:5414082 Generation:3 CreationTimestamp:2020-08-18 06:29:02 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[update-timestamp:1597732212] OwnerReferences:[] Finalizers:[] ClusterName: ManagedFields:[]} Status:{Version:v0.5.0-12-gda2b27f HAStatus:active LocalEndpoint:{ClusterID:nmanos-cluster-b CableName:submariner-cable-nmanos-cluster-b-10-166-0-205 Hostname:default-cl1-ff6px-worker-kjg2n Subnets:[169.254.32.0/19] PrivateIP:10.166.0.205 PublicIP:66.187.233.202 NATEnabled:true Backend:strongswan BackendConfig:map[]} StatusFailure: Connections:[{Status:connected StatusMessage:Connected to 18.216.200.184:4501 - encryption alg=AES_GCM_16, keysize=128 rekey-time=13414 Endpoint:{ClusterID:nmanos-cluster-a CableName:submariner-cable-nmanos-cluster-a-10-166-15-143 Hostname:ip-10-166-15-143 Subnets:[169.254.0.0/19] PrivateIP:10.166.15.143 PublicIP:18.216.200.184 NATEnabled:true Backend:strongswan BackendConfig:map[]}}]}}
sridhargaddam commented 4 years ago

This will be mitigated once the following enhancement is implemented - https://github.com/submariner-io/submariner-operator/issues/635

sridhargaddam commented 4 years ago

This will be mitigated once the following enhancement is implemented - submariner-io/submariner-operator#635

Closing this issue.

manosnoam commented 4 years ago

Seems like the error has returned in recent subctl version: v0.7.0-pre0-7-g29508ae.

Name:         default-cl1-tqcht-worker-ffwhr
Namespace:    submariner-operator
Labels:       <none>
Annotations:  update-timestamp: 1600839864
API Version:  submariner.io/v1
Kind:         Gateway
Metadata:
  Creation Timestamp:  2020-09-23T05:43:13Z
  Generation:          2
  Resource Version:    2552782
  Self Link:           /apis/submariner.io/v1/namespaces/submariner-operator/gateways/default-cl1-tqcht-worker-ffwhr
  UID:                 9d6c300d-3b80-4220-a76f-9cce0aa8acb5
Status:
  Connections:
    Endpoint:
      Backend:      strongswan
      cable_name:   submariner-cable-nmanos-cluster-a-10-0-67-89
      cluster_id:   nmanos-cluster-a
      Hostname:     ip-10-0-67-89
      nat_enabled:  true
      private_ip:   10.0.67.89
      public_ip:    54.67.46.45
      Subnets:
        172.30.0.0/16
        10.128.0.0/14
    Status:          error
    Status Message:  No IKE SA found for cable submariner-cable-nmanos-cluster-a-10-0-67-89
  Ha Status:         active
  Local Endpoint:
    Backend:      strongswan
    cable_name:   submariner-cable-default-cl1-10-166-2-231
    cluster_id:   default-cl1
    Hostname:     default-cl1-tqcht-worker-ffwhr
    nat_enabled:  true
    private_ip:   10.166.2.231
    public_ip:    66.187.232.129
    Subnets:
      100.96.0.0/16
      10.252.0.0/14
  Status Failure:  
  Version:         v0.6.0-30-g507166f
Events:            <none>
Name:         default-cl1-tqcht-worker-ffwhr
Namespace:    submariner-operator
Labels:       <none>
Annotations:  update-timestamp: 1600839864
API Version:  submariner.io/v1
Kind:         Gateway
Metadata:
  Creation Timestamp:  2020-09-23T05:43:13Z
  Generation:          2
  Resource Version:    2552782
  Self Link:           /apis/submariner.io/v1/namespaces/submariner-operator/gateways/default-cl1-tqcht-worker-ffwhr
  UID:                 9d6c300d-3b80-4220-a76f-9cce0aa8acb5
Status:
  Connections:
    Endpoint:
      Backend:      strongswan
      cable_name:   submariner-cable-nmanos-cluster-a-10-0-67-89
      cluster_id:   nmanos-cluster-a
      Hostname:     ip-10-0-67-89
      nat_enabled:  true
      private_ip:   10.0.67.89
      public_ip:    54.67.46.45
      Subnets:
        172.30.0.0/16
        10.128.0.0/14
    Status:          error
    Status Message:  No IKE SA found for cable submariner-cable-nmanos-cluster-a-10-0-67-89
  Ha Status:         active
  Local Endpoint:
    Backend:      strongswan
    cable_name:   submariner-cable-default-cl1-10-166-2-231
    cluster_id:   default-cl1
    Hostname:     default-cl1-tqcht-worker-ffwhr
    nat_enabled:  true
    private_ip:   10.166.2.231
    public_ip:    66.187.232.129
    Subnets:
      100.96.0.0/16
      10.252.0.0/14
  Status Failure:  
  Version:         v0.6.0-30-g507166f
Events:            <none>
Name:         submariner
Namespace:    submariner-operator
Labels:       <none>
Annotations:  <none>
API Version:  submariner.io/v1alpha1
Kind:         Submariner
Metadata:
  Creation Timestamp:  2020-09-23T05:42:56Z
  Generation:          1
  Resource Version:    2552630
  Self Link:           /apis/submariner.io/v1alpha1/namespaces/submariner-operator/submariners/submariner
  UID:                 f56d4b12-b0cd-4745-b20e-bb0829f169f9
Spec:
  Broker:                     k8s
  brokerK8sApiServer:         api.nmanos-cluster-a.devcluster.openshift.com:6443

In Gateway Pod I see:

StatusFailure: Connections:[{Status:error StatusMessage:No IKE SA found for cable submariner-cable-nmanos-cluster-a-10-0-67-89 Endpoint:{ClusterID:nmanos-cluster-a CableName:submariner-cable-nmanos-cluster-a-10-0-67-89 Hostname:ip-10-0-67-89 Subnets:[172.30.0.0/16 10.128.0.0/14] PrivateIP:10.0.67.89 PublicIP:54.67.46.45 NATEnabled:true Backend:strongswan BackendConfig:map[]}}]}}
I0923 05:44:39.954407       1 syncer.go:70] Running Gateway status sync
..
09[IKE]   16: 07 DF 4D A7 00 00                                ..M...
09[IKE] natd_hash => 20 bytes @ 0x7f170c006ba0
09[IKE]    0: 76 67 4D D8 FE ED C8 94 5B EB 2E 67 0C 41 17 FA  vgM.....[..g.A..
09[IKE]   16: 2C 7F A4 87                                      ,...
09[ENC] generating IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ]
09[NET] sending packet: from 10.166.2.231[501] to 54.67.46.45[4501] (500 bytes)
09[MGR] checkin IKE_SA submariner-cable-nmanos-cluster-a-10-0-67-89[1]
05[NET] sending packet: from 10.166.2.231[501] to 54.67.46.45[4501]
09[MGR] checkin of IKE_SA successful
15[CFG] vici client 3 disconnected
02[NET] received packet => 40 bytes @ 0x7f17431a94e0
02[NET]    0: 00 00 00 00 9B 1C BD 2A D3 B7 2C A5 61 D5 34 80  .......*..,.a.4.
02[NET]   16: 29 41 8A 2A 29 20 22 20 00 00 00 00 00 00 00 24  )A.*) " .......$
02[NET]   32: 00 00 00 08 00 00 00 0E                          ........
02[NET] received packet: from 54.67.46.45[4501] to 10.166.2.231[501]
02[NET] waiting for data on sockets
10[MGR] checkout IKEv2 SA by message with SPIs 9b1cbd2ad3b72ca5_i 61d5348029418a2a_r
10[MGR] IKE_SA submariner-cable-nmanos-cluster-a-10-0-67-89[1] successfully checked out
10[NET] received packet: from 54.67.46.45[4501] to 10.166.2.231[501] (36 bytes)
10[ENC] parsed IKE_SA_INIT response 0 [ N(NO_PROP) ]
10[IKE] received NO_PROPOSAL_CHOSEN notify error
10[CFG] configured proposals: IKE:AES_GCM_16_128/PRF_HMAC_SHA2_256/MODP_2048, IKE:AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_2048
10[MGR] checkin and destroy IKE_SA submariner-cable-nmanos-cluster-a-10-0-67-89[1]
10[IKE] IKE_SA submariner-cable-nmanos-cluster-a-10-0-67-89[1] state change: CONNECTING => DESTROYING
10[MGR] checkin and destroy of IKE_SA successful
12[KNL] interface vethe398604c activated
16[KNL] creating roam job due to address/link change
07[KNL] interface vx-submariner activated
08[KNL] 240.166.2.231 appeared on vx-submariner
07[KNL] creating roam job due to address/link change
08[KNL] fe80::7ca4:48ff:fedf:9a99 appeared on vethe398604c
13[KNL] creating roam job due to address/link change
07[KNL] fe80::8872:86ff:fe8f:345a appeared on vx-submariner
08[KNL] creating roam job due to address/link change
13[CFG] vici client 4 connected
16[CFG] vici client 4 registered for: list-sa
06[CFG] vici client 4 requests: list-sas
11[CFG] vici client 4 unregistered for: list-sa
09[CFG] vici client 4 disconnected
16[MGR] checkout IKEv2 SA with SPIs 9b1cbd2ad3b72ca5_i 0000000000000000_r
16[MGR] IKE_SA checkout not successful
14[CFG] vici client 5 connected
14[CFG] vici client 5 registered for: list-sa
07[CFG] vici client 5 requests: list-sas
08[CFG] vici client 5 unregistered for: list-sa
15[CFG] vici client 5 disconnected
12[CFG] vici client 6 connected
08[CFG] vici client 6 registered for: list-sa
09[CFG] vici client 6 requests: list-sas
06[CFG] vici client 6 unregistered for: list-sa
14[CFG] vici client 6 disconnected
16[CFG] vici client 7 connected
06[CFG] vici client 7 registered for: list-sa
14[CFG] vici client 7 requests: list-sas
15[CFG] vici client 7 unregistered for: list-sa
11[CFG] vici client 7 disconnected
09[CFG] vici client 8 connected
16[CFG] vici client 8 registered for: list-sa
07[CFG] vici client 8 requests: list-sas
04[CFG] vici client 8 unregistered for: list-sa
09[CFG] vici client 8 disconnected
08[CFG] vici client 9 connected
04[CFG] vici client 9 registered for: list-sa
06[CFG] vici client 9 requests: list-sas
07[CFG] vici client 9 unregistered for: list-sa
13[CFG] vici client 9 disconnected
10[CFG] vici client 10 connected
08[CFG] vici client 10 registered for: list-sa
13[CFG] vici client 10 requests: list-sas
06[CFG] vici client 10 unregistered for: list-sa
08[CFG] vici client 10 disconnected
11[CFG] vici client 11 connected
12[CFG] vici client 11 registered for: list-sa
08[CFG] vici client 11 requests: list-sas
09[CFG] vici client 11 unregistered for: list-sa
06[CFG] vici client 11 disconnected
15[CFG] vici client 12 connected
15[CFG] vici client 12 registered for: list-sa
06[CFG] vici client 12 requesI0923 05:44:39.955590       1 syncer.go:196] Generated Gateway object: {TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:default-cl1-tqcht-worker-ffwhr GenerateName: Namespace: SelfLink: UID: ResourceVersion: Generation:0 CreationTimestamp:0001-01-01 00:00:00 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[update-timestamp:1600839879] OwnerReferences:[] Finalizers:[] ClusterName: ManagedFields:[]} Status:{Version:v0.6.0-30-g507166f HAStatus:active LocalEndpoint:{ClusterID:default-cl1 CableName:submariner-cable-default-cl1-10-166-2-231 Hostname:default-cl1-tqcht-worker-ffwhr Subnets:[100.96.0.0/16 10.252.0.0/14] PrivateIP:10.166.2.231 PublicIP:66.187.232.129 NATEnabled:true Backend:strongswan BackendConfig:map[]} StatusFailure: Connections:[{Status:error StatusMessage:No IKE SA found for cable submariner-cable-nmanos-cluster-a-10-0-67-89 Endpoint:{ClusterID:nmanos-cluster-a CableName:submariner-cable-nmanos-cluster-a-10-0-67-89 Hostname:ip-10-0-67-89 Subnets:[172.30.0.0/16 10.128.0.0/14] PrivateIP:10.0.67.89 PublicIP:54.67.46.45 NATEnabled:true Backend:strongswan BackendConfig:map[]}}]}}

See full test report (+ pods log collection at the end) here: https://qe-jenkins-csb-skynet.cloud.paas.psi.redhat.com/job/Submariner-OSP-AWS/823/Test-Report/

manosnoam commented 4 years ago

With the latest subctl version: v0.7.0-pre1-31-g0a3cceb this error (No IKE SA found for cable) seems to be gone. Instead it shows correct statuses:

17:00:05   Connections:
17:00:05     Endpoint:
17:00:05       Backend:      strongswan
17:00:05       cable_name:   submariner-cable-default-cl2-10-2-3-194
17:00:05       cluster_id:   default-cl2
17:00:05       Hostname:     default-cl2-gnjf4-worker-d8crz
17:00:05       nat_enabled:  true
17:00:05       private_ip:   10.2.3.194
17:00:05       public_ip:    66.187.232.129
17:00:05       Subnets:
17:00:05         172.32.0.0/16
17:00:05         10.136.0.0/14
17:00:05     Status:          connecting
17:00:05     Status Message:  Connecting to 66.187.232.129:4501
17:00:05   Ha Status:         active

...

17:00:50     Connections:
17:00:50       Endpoint:
17:00:50         Backend:      strongswan
17:00:50         cable_name:   submariner-cable-default-cl2-10-2-3-194
17:00:50         cluster_id:   default-cl2
17:00:50         Hostname:     default-cl2-gnjf4-worker-d8crz
17:00:50         nat_enabled:  true
17:00:50         private_ip:   10.2.3.194
17:00:50         public_ip:    66.187.232.129
17:00:50         Subnets:
17:00:50           172.32.0.0/16
17:00:50           10.136.0.0/14
17:00:50       Status:          connected
17:00:50       Status Message:  Connected to 66.187.232.129:4501 - encryption alg=AES_GCM_16, keysize=128 rekey-time=13559
17:00:50     Ha Status:         active