haproxytech / dataplaneapi

HAProxy Data Plane API
https://www.haproxy.com/documentation/dataplaneapi/
Apache License 2.0
329 stars 76 forks source link

Continuation of Consul Service Discovery After Consul Outage #317

Open adam820 opened 1 year ago

adam820 commented 1 year ago

Hi! I'm evaluating the Consul service discovery for some of our HAProxy instances, and during testing I noticed that if the Consul server becomes unavailable (e.g. I have a single-node DC for testing), after a period of time, DPAPI stops the discovery job, with only a single entry in my DEBUG-level log:

time="2023-11-15T17:42:33Z" level=debug msg="discovery job stopping" ID=e3f2d9b8-865e-41e3-aa09-d923ab912695 ServiceDiscovery=Consul

This doesn't seem to indicate at all what happened here (Consul service became unavailable/unreachable), nor are there any logs indicating failure to do discovery.

Additionally, it does not seem to resume the discovery job after Consul returns (I waited several minutes and started the container again), until after the DPAPI application was completely reloaded. While the job was stopped, the status on the /v2/service_discovery/consul endpoint was still listed as enabled: true.

I can understand not continuing to probe a server that's down every 10 seconds or so, but would it make sense to have something like a backoff probe to check that the service is alive again to automatically resume the discovery jobs? Is this a bug, or do I just not have something configured correctly?

Thanks!


Configuration sent to configure service discovery:

{
  "address": "192.168.12.50",
  "port": 8500,
  "enabled": true,
  "retry_timeout": 10
}

HAP version: HAProxy version 2.8.3-1ppa1~focal 2023/09/08 DPAPI version: HAProxy Data Plane API v2.6.4 faadf1a

dhruvjain99 commented 6 months ago

It is indeed a bug. Even I faced this issue while I was evaluating it. Digging deeper, I found that the go routine which watches the consul server for updates is closed/terminated upon any kind error while updating. Ref: https://github.com/haproxytech/dataplaneapi/blob/b1084f8ce2568b1b8a39c8d9c8eee7b1f3d25808/discovery/consul_service_discovery_instance.go#L124

For now I have decided to go ahead with consul-templates which seems to be stabler than dataplaneapi and due to this limitation of custom server definition(#328).