Open jameshartig opened 8 months ago
Since it is working for me, could you please expand on the error you may be noticing?
With tcp alone:
With the below definition, consul-esm was able to identify the actual service going down.
{ "Node": "venus-dc-ext-count-tls-node", "Address": "172.31.26.18", "Token": "3d5f4ccd-c076-92b1-c88e-2abe70493e2a", "NodeMeta": { "external-node": "true", "external-probe": "true" }, "Service": { "ID": "venus-dc-count-tls", "Service": "venus-dc-ext-count-tls", "Port": 10017 }, "Checks": [ { "Name": "venus-dc-ext-count-tls-check", "Status": "passing", "Definition": { "Name": "venus-dc-ext-count-tls TCP check on port 172.31.26.18:10017", "TCP": "172.31.26.18:10017", "Interval": "10s", "Timeout": "1s" } } ] }
consul-esm identified the below:
2024-01-17T06:55:48.247Z [WARN] consul-esm: Check is now critical: check=venus-dc-ext-count-node/venus-dc-ext-count-check 2024-01-17T06:55:52.583Z [WARN] consul-esm: Check socket connection failed: check=venus-dc-ext-count-tls-node/venus-dc-ext-count-tls-check error="dial tcp 172.31.26.18:10017: connect: connection refused"
With a TLS health check:
With an external service definition as below -
(venv) root@ip-172-31-18-50:~# cat tgw-app-count-tls.json { "Node": "venus-dc-ext-count-tls-node", "Address": "172.31.26.18", "Token": "885cb598-b105-e554-f7f3-ed084d760f32", "NodeMeta": { "external-node": "true", "external-probe": "true" }, "Service": { "ID": "venus-dc-count-tls", "Service": "venus-dc-ext-count-tls", "Port": 10017 }, "Checks": [ { "Name": "venus-dc-ext-count-tls-check", "Status": "passing", "Definition": { "Name": "venus-dc-ext-count-tls TCP check on port 172.31.26.18:10017", "HTTP": "https://172.31.26.18:10017/health", "Interval": "10s", "Timeout": "1s" } } ] }
(venv) root@ip-172-31-18-50:~#
> And, a consul-esm config file and start as below -
```shell
(venv) root@ip-172-31-26-18:~# cat $PWD/consul-esm-config.hcl
https_ca_file = "/opt/consul/custom-apps/tgw/certs/venus-srv.com.crt"
(venv) root@ip-172-31-26-18:~# consul-esm -config-file $PWD/consul-esm-config.hcl
consul-esm was able to run my health checks. Below is the output as seen from the link
https://<host>:8501/ui/venus-dc/services/venus-dc-ext-count-tls/instances/venus-dc-ext-count-tls-node/venus-dc-count-tls/health-checks
-
Output
HTTP GET https://172.31.26.18:10017/health: 200 OK Output: {"hostname":"ip-172-31-26-18","inside_function":"/opt/consul/custom-apps/tgw/tgw-count-tls.py['health']","response":"healthy"}
@vyanamandra your example used a HTTP health check and not TCP. I'm talking about a TCP health check with TCPUseTLS
set to true. Please see the linked MR for consul in the issue description.
@jameshartig I had no idea this project existed when I added TCP+TLS to consul itself. Sorry about that. I don't believe it's been plumbed through nomad either.
TCP+TLS health checks were added in https://github.com/hashicorp/consul/pull/18381 but from what I can tell they're not supported in consul-esm.