skupperproject / skupper

Skupper is an implementation of a Virtual Application Network, enabling rich hybrid cloud communication.
http://skupper.io
Apache License 2.0
579 stars 70 forks source link

Possible service sync failure when migrating a service #614

Open ssorj opened 2 years ago

ssorj commented 2 years ago

Untitled drawing (1)

I have a scenario where I start running Hello World on two sites, then add a third site in a linear topology, remove the backend from the middle site (OCP), and add the backend to the end site (EKS), as shown in the picture. I do this inside a relatively short timeframe (as fast as I can type out the commands).

When I do this, I see what might be incorrect service sync behavior: the newly relocated backend service is not available at the frontend site (GKE).

GKE:

~/code/skupper-example-hello-world$ kcp
NAME                                          READY   STATUS    RESTARTS   AGE
hello-world-frontend-86bb59f5cd-ghrqk         1/1     Running   0          3h24m
skupper-router-68c47f47f4-sqwx5               1/1     Running   0          85m
skupper-service-controller-68dcb88bdb-6klmh   0/1     Pending   0          67s
skupper-service-controller-7cc47f4f4c-wbj9n   1/1     Running   0          5h50m

~/code/skupper-example-hello-world$ kcs
NAME                   TYPE           CLUSTER-IP   EXTERNAL-IP      PORT(S)                           AGE
hello-world-frontend   LoadBalancer   10.45.2.9    34.74.42.122     8080:31343/TCP                    8h
skupper                LoadBalancer   10.45.2.8    34.75.90.149     8080:32480/TCP,8081:30742/TCP     8h
skupper-router         LoadBalancer   10.45.0.68   35.231.115.195   55671:30388/TCP,45671:31866/TCP   8h
skupper-router-local   ClusterIP      10.45.1.58   <none>           5671/TCP                          8h

~/code/skupper-example-hello-world$ kc get cm/skupper-services -o yaml
apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: "2021-10-26T10:16:40Z"
  name: skupper-services
  namespace: hello-world
  ownerReferences:
  - apiVersion: v1
    kind: ConfigMap
    name: skupper-site
    uid: 49a75f65-25f9-42e9-8633-5dd111602df2
  resourceVersion: "24158411"
  uid: a63773a1-b418-434d-82a3-de4e95cfb80e

OCP:

~/code/skupper-example-hello-world$ kcp
NAME                                          READY   STATUS    RESTARTS   AGE
skupper-router-855b854f4f-l4bzr               1/1     Running   0          17h
skupper-service-controller-7df4f79c74-684p5   1/1     Running   0          17h

~/code/skupper-example-hello-world$ kcs
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)               AGE
hello-world-backend    ClusterIP   172.21.244.121   <none>        8080/TCP              94m
skupper                ClusterIP   172.21.179.183   <none>        8080/TCP,8081/TCP     95m
skupper-router         ClusterIP   172.21.118.141   <none>        55671/TCP,45671/TCP   95m
skupper-router-local   ClusterIP   172.21.151.249   <none>        5671/TCP              95m

~/code/skupper-example-hello-world$ kc get cm/skupper-services -o yaml
apiVersion: v1
data:
  hello-world-backend: '{"address":"hello-world-backend","protocol":"tcp","port":8080,"targets":[],"origin":"ca9ca3dd-f687-4c03-963a-ae37576af9d2"}'
kind: ConfigMap
metadata:
  creationTimestamp: "2021-10-26T16:50:53Z"
  name: skupper-services
  namespace: hello-world
  ownerReferences:
  - apiVersion: v1
    kind: ConfigMap
    name: skupper-site
    uid: c7bb74fc-f62c-432d-ac1f-f61f5a53e68d
  resourceVersion: "1683450"
  selfLink: /api/v1/namespaces/hello-world/configmaps/skupper-services
  uid: fda272e9-a90b-48ee-8582-929fdf793abd

EKS:

~/code/skupper-example-patient-portal$ kcp
NAME                                         READY   STATUS    RESTARTS   AGE
hello-world-backend-7dfb45b98d-j7cnt         1/1     Running   0          17h
skupper-router-786fd5c85b-p2blr              1/1     Running   0          16h
skupper-service-controller-c8d7dcb4f-t5sb5   1/1     Running   0          17h

~/code/skupper-example-patient-portal$ kcs
NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP                                                               PORT(S)                           AGE
hello-world-backend    ClusterIP      10.100.66.117   <none>                                                                    8080/TCP                          94m
skupper                LoadBalancer   10.100.119.95   a5a1ab4a827f24383bd8fd84c4168eb7-1983076065.eu-west-2.elb.amazonaws.com   8080:31196/TCP,8081:30187/TCP     114m
skupper-router         LoadBalancer   10.100.7.232    af259d90237834342ad0f92ad583f9ff-1302039450.eu-west-2.elb.amazonaws.com   55671:32438/TCP,45671:31757/TCP   114m
skupper-router-local   ClusterIP      10.100.27.94    <none>                                                                    5671/TCP                          114m

~/code/skupper-example-patient-portal$ kc get cm/skupper-services -o yaml
apiVersion: v1
data:
  hello-world-backend: '{"address":"hello-world-backend","protocol":"tcp","port":8080,"targets":[{"name":"hello-world-backend","selector":"app=hello-world-backend"}]}'
kind: ConfigMap
metadata:
  creationTimestamp: "2021-10-26T16:28:32Z"
  name: skupper-services
  namespace: hello-world
  ownerReferences:
  - apiVersion: v1
    kind: ConfigMap
    name: skupper-site
    uid: ca9ca3dd-f687-4c03-963a-ae37576af9d2
  resourceVersion: "283154"
  uid: 458613be-f4f7-4a7f-a431-545bca34e407

~/code/skupper-example-patient-portal$ kc get cm/skupper-site -o yaml
apiVersion: v1
data:
  console: "true"
  console-authentication: internal
  console-password: ""
  console-user: ""
  ingress: loadbalancer
  name: hello-world
  router-console: "false"
  router-logging: ""
  router-mode: interior
  service-controller: "true"
  service-sync: "true"
kind: ConfigMap
metadata:
  creationTimestamp: "2021-10-26T16:28:30Z"
  labels:
    internal.skupper.io/site-controller-ignore: "true"
  name: skupper-site
  namespace: hello-world
  resourceVersion: "280757"
  uid: ca9ca3dd-f687-4c03-963a-ae37576af9d2
fgiorgetti commented 2 years ago

@ssorj I tried this same scenario locally and I could not reproduce this behavior.

I simply removed the deployment from the OCP site, then created the deployment at the EKS.

After I have done that, I noticed that in the OCP site, the hello-world-backend service was still bound to the hello-world-backend deployment (removed).

Then I bound it to the hello-world-backend that was created in EKS and it worked just fine.

I would say we might have an issue with the "unbind" that should have been triggered in at the OCP site after the respective local target was removed. @grs @ajssmith any thoughts?

One question related to the service not showing up. Did you create any of your sites as an edge site (like the OCP site)? If you did, then only one of the interior links will be active, which might be a reason for services to not show up at GKE.

grs commented 2 years ago

At present you need to explicitly unbind or delete the service.Deleting the deployment will not delete a local binding.