skupperproject / skupper

Skupper is an implementation of a Virtual Application Network, enabling rich hybrid cloud communication.
http://skupper.io
Apache License 2.0
592 stars 73 forks source link

[v2] skupper-router pod restart after adding, deleting a link #1628

Closed Karen-Schoener closed 2 months ago

Karen-Schoener commented 2 months ago

Describe the bug After deleting a link in the v2 branch, I noticed that the skupper-router pod restarts.

 $     kubectl get site -A
 NAMESPACE   NAME   STATUS                                          SITES IN NETWORK
 east        east   containers with unready status: [config-sync]   2
 west        west   containers with unready status: [config-sync]   2

How To Reproduce Test steps:

Create sites in east, west.

     kubectl apply -n west -f cmd/controller/example/site1.yaml
     kubectl apply -n east -f cmd/controller/example/site2.yaml

Generate a link. Create the link.

     ./skupper link generate -n west --cost=10 --tls-secret=link-east-to-west > link_east_to_west.yaml
     kubectl -n east apply -f link_east_to_west.yaml

Wait for the sites to show as linked.

$ kubectl get site -A
NAMESPACE   NAME   STATUS   SITES IN NETWORK
east        east   OK       2
west        west   OK       2

Delete the link.

    kubectl -n east delete link link-west
    kubectl -n east delete secret link-east-to-west

Watch for the link to be deleted.

$ kubectl get site -A
NAMESPACE   NAME   STATUS                                          SITES IN NETWORK
east        east   containers with unready status: [config-sync]   2
west        west   containers with unready status: [config-sync]   2

Expected behavior Ability to delete link without skupper-router pod restart.

Environment details

v2 skupper branch commit:

commit 46036fcdb2f8d34a4d45f867a56286d453d1f89a (HEAD -> v2, origin/v2)
Author: Noe Luaces <nluaces@redhat.com>
Date:   Tue Aug 27 18:44:34 2024 +0200

    Reduce interval in retries and update messages (#1618)

    * remove reference to skupper controller logs and instead point to the resource status command, in case of error.

    * reduce interval in case the timeoutInSeconds value is 1 second

    * add configuration for retry functions in the spinner

Platform: minikube

Karen-Schoener commented 2 months ago

Attaching config sync logs.

logs_west_skupper_router_config_sync.txt logs_east_skupper_router_config_sync.txt

Karen-Schoener commented 2 months ago

Attaching logs from east.

In east, a panic occurred in the config-sync container. In east, there was no panic in the router container.

Also, in west, a panic occurred in the config-sync container. In west, there was no panic in the router container.

logs_east_router.txt logs_east_config_sync.txt

grs commented 2 months ago

@c-kruse could you take a look at this when you have a moment? It looks like maybe a handler is not being unregistered in some way or something. Basically when a link between two sites is deleted, the config-sync pod is panicking:

panic: close of closed channel

goroutine 64 [running]:
github.com/skupperproject/skupper/pkg/vanflow/eventsource.(*Client).start.func1()
    /go/src/app/pkg/vanflow/eventsource/client.go:59 +0x17
github.com/skupperproject/skupper/pkg/vanflow/eventsource.(*Client).Close(0xc0005ee2a0)
    /go/src/app/pkg/vanflow/eventsource/client.go:149 +0xa2
github.com/skupperproject/skupper/pkg/kube/flow.(*StatusSync).handleForgotten(0xc0000b15e0, {{0xc0005a2bd0, 0x24}, 0x1, {0xc000590cc0, 0xa}, {0xc0005a2b70, 0x2b}, {0xc0005a2ba0, 0x28}, ...})
    /go/src/app/pkg/kube/flow/status.go:443 +0xbf
github.com/skupperproject/skupper/pkg/vanflow/eventsource.(*Discovery).handleDiscovery(0xc0000976d0, {0x17335d0, 0xc00047f270}, {0xc0000a8e20?, 0xc0000a8e30?})
    /go/src/app/pkg/vanflow/eventsource/discovery.go:87 +0x175
created by github.com/skupperproject/skupper/pkg/vanflow/eventsource.(*Discovery).Run in goroutine 63
    /go/src/app/pkg/vanflow/eventsource/discovery.go:59 +0x1eb
c-kruse commented 2 months ago

Ah, this is familiar! Already have this fixed in a work in progress branch with a lot of other changes, but will get this out by itself in a few

ssorj commented 2 months ago

@c-kruse supposing 1.x is tested with connection dropping enabled, would this problem show up, or is it exclusively a 2.x issue?

c-kruse commented 2 months ago

@ssorj this is a 2.x issue exclusively - part of the new pkg/vanflow package that is not in 1.x.

c-kruse commented 2 months ago

Fixed in https://github.com/skupperproject/skupper/pull/1639