skupperproject / skupper

Skupper is an implementation of a Virtual Application Network, enabling rich hybrid cloud communication.
http://skupper.io
Apache License 2.0
595 stars 74 forks source link

[v2] Instability in the maintenance of router configuration #1727

Closed ted-ross closed 1 month ago

ted-ross commented 1 month ago

Describe the bug In a network consisting of 4 interior kubernetes sites and an interior podman site, I'm seeing service-connectivity problems and instability in the maintenance of the router configuration, specifically the tcpListener and tcpConnector entities.

When a connector target is deployed, the tcpConnector is created and destroyed repeatedly, dozens of times, in the router local to the target workload. A tcpListener used by the target workload also thrashes similarly.

It it interesting to note that the siteId attribute in the tcpConnector and tcpListener entities in the router toggle between two different values. One of the values is correct (i.e. the UID of the kubernetes Site CR), and the other is a different UUID that I cannot associate with any other object.

How To Reproduce It is unclear how to portably reproduce this issue. I am happy to deploy a network that shows the problem for anyone to look at.

Expected behavior I expect the router configuration to change in an orderly, once-when-needed fashion.

Environment details

Additional context None.

ted-ross commented 1 month ago

h/t to @fgiorgetti for helping to identify the detailed symptoms of this issue.

grs commented 1 month ago

Is it possible there is more than one controller running on the cluster affected? Is there only a single Site CR per namespace?

ted-ross commented 1 month ago

The namespace in question has only one Site CR. I am uncertain about the presence of spurious controllers. There was no controller in the site's namespace.

ted-ross commented 1 month ago

Current behavior:

The siteId is toggling on a 5-minute cycle: 4 minutes on one and 1 minute on the other, repeatedly.

There is only one Site CR on the entire cluster. There is only one instance of the skupper-controller running on the cluster (in the skupper namespace).