Closed ted-ross closed 1 month ago
h/t to @fgiorgetti for helping to identify the detailed symptoms of this issue.
Is it possible there is more than one controller running on the cluster affected? Is there only a single Site CR per namespace?
The namespace in question has only one Site CR. I am uncertain about the presence of spurious controllers. There was no controller in the site's namespace.
Current behavior:
The siteId is toggling on a 5-minute cycle: 4 minutes on one and 1 minute on the other, repeatedly.
There is only one Site CR on the entire cluster. There is only one instance of the skupper-controller running on the cluster (in the skupper namespace).
Describe the bug In a network consisting of 4 interior kubernetes sites and an interior podman site, I'm seeing service-connectivity problems and instability in the maintenance of the router configuration, specifically the tcpListener and tcpConnector entities.
When a connector target is deployed, the tcpConnector is created and destroyed repeatedly, dozens of times, in the router local to the target workload. A tcpListener used by the target workload also thrashes similarly.
It it interesting to note that the
siteId
attribute in the tcpConnector and tcpListener entities in the router toggle between two different values. One of the values is correct (i.e. the UID of the kubernetes Site CR), and the other is a different UUID that I cannot associate with any other object.How To Reproduce It is unclear how to portably reproduce this issue. I am happy to deploy a network that shows the problem for anyone to look at.
Expected behavior I expect the router configuration to change in an orderly, once-when-needed fashion.
Environment details
Additional context None.