Consul-Sync pointing to same Consul deregisters all services on K8S-Sync Node

webmutation commented 2 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

If two or more instances of consul-sync are running and pointing to the same Consul external service, all the services get unregistered and it goes into a loop of unregistering, registering services.

Reproduction Steps

Create two EKS cluster

Deploy the Consul Helm chart with the k8s-sync service


# Requires to set the external Consul catalog URL parameter manually in the deloyment...

global: enabled: false

client: enabled: false

externalServers: enabled: true hosts:

'1.1.1.1'

syncCatalog: enabled: true k8sDenyNamespaces: ["kube-system", "kube-public"]

3. kubectl edit deployment consul-consul-sync-catalog
4. change value to point to the external Consul cluster

name: CONSUL_HTTP_ADDR value: http://external-consul

And the unregistering of the services starts to occur... some services show up, then all services show up then all services disappear. And this loop goes on forever

Expected behavior

Services should not disappear, additional cluster connecting to Consul, should simply have their services registered. Services should not be unregistered. This is probably because the special k8s-sync node is being deleted or recreated...

webmutation commented 2 years ago

Tried to workaround... change the nodeName on the second cluster the behaviour is still the same... also since there is no healthcheck on the node. the nodes become orphaned (k8s-sync-A).

bondido commented 2 years ago

@webmutation - the working way of handling your scenario is to differentiate services from every kubernetes cluster by tag in consul catalog. It's described here - https://github.com/hashicorp/consul-k8s/issues/579 - and confirmed that it's the expected method.

webmutation commented 2 years ago

Thank you @bondido tested and indeed it is working! services are now staying registered.

However I wonder if there is a setting to remove orphan nodes after a timeout period... in other words how can we avoid having to manually remove the nodes? Is this possible? I was not able to find anything in the charts.

thisisnotashwin commented 2 years ago

Hey @webmutation !! Consul does have a default setting for removing orphan node which is currently in the range of days. We do not expose this via the helm chart and I don't think we intend on doing so at the moment, unfortunately. We don't see this as a scenario users are expected to run into in a stable deployment.

webmutation commented 2 years ago

Thanks for the message @thisisnotashwin it is clear now.

In our case, we have on-demand clusters that live only for a few hours or days, for UAT, Training events or Integration testing (specific versions of components being deployed)... I think we will have to write a script to manually remove it once the cluster is destroyed. It should not be a huge issue to handle. Thanks.

hashicorp / consul-k8s