hashicorp / consul-esm

External service monitoring for Consul
Mozilla Public License 2.0
262 stars 41 forks source link

Add support for TCP+TLS health checks #247

Open jameshartig opened 8 months ago

jameshartig commented 8 months ago

TCP+TLS health checks were added in https://github.com/hashicorp/consul/pull/18381 but from what I can tell they're not supported in consul-esm.

vyanamandra commented 8 months ago

Since it is working for me, could you please expand on the error you may be noticing?

With tcp alone:

With the below definition, consul-esm was able to identify the actual service going down.

{
"Node": "venus-dc-ext-count-tls-node",
"Address": "172.31.26.18",
"Token": "3d5f4ccd-c076-92b1-c88e-2abe70493e2a",
"NodeMeta": {
"external-node": "true",
"external-probe": "true"
},
"Service": {
"ID": "venus-dc-count-tls",
"Service": "venus-dc-ext-count-tls",
"Port": 10017
},
"Checks": [
{
"Name": "venus-dc-ext-count-tls-check",
"Status": "passing",
"Definition": {
"Name": "venus-dc-ext-count-tls TCP check on port 172.31.26.18:10017",
"TCP": "172.31.26.18:10017",
"Interval": "10s",
"Timeout": "1s"
}
}
]
}

consul-esm identified the below:

2024-01-17T06:55:48.247Z [WARN]  consul-esm: Check is now critical: check=venus-dc-ext-count-node/venus-dc-ext-count-check
2024-01-17T06:55:52.583Z [WARN]  consul-esm: Check socket connection failed: check=venus-dc-ext-count-tls-node/venus-dc-ext-count-tls-check error="dial tcp 172.31.26.18:10017: connect: connection refused"

With a TLS health check:

With an external service definition as below -


(venv) root@ip-172-31-18-50:~# cat tgw-app-count-tls.json
{
"Node": "venus-dc-ext-count-tls-node",
"Address": "172.31.26.18",
"Token": "885cb598-b105-e554-f7f3-ed084d760f32",
"NodeMeta": {
"external-node": "true",
"external-probe": "true"
},
"Service": {
"ID": "venus-dc-count-tls",
"Service": "venus-dc-ext-count-tls",
"Port": 10017
},
"Checks": [
{
"Name": "venus-dc-ext-count-tls-check",
"Status": "passing",
"Definition": {
"Name": "venus-dc-ext-count-tls TCP check on port 172.31.26.18:10017",
"HTTP": "https://172.31.26.18:10017/health",
"Interval": "10s",
"Timeout": "1s"
}
}
]
}

(venv) root@ip-172-31-18-50:~#


> And, a consul-esm config file and start as below -
```shell
(venv) root@ip-172-31-26-18:~# cat $PWD/consul-esm-config.hcl
https_ca_file = "/opt/consul/custom-apps/tgw/certs/venus-srv.com.crt"
(venv) root@ip-172-31-26-18:~# consul-esm -config-file $PWD/consul-esm-config.hcl

consul-esm was able to run my health checks. Below is the output as seen from the link https://<host>:8501/ui/venus-dc/services/venus-dc-ext-count-tls/instances/venus-dc-ext-count-tls-node/venus-dc-count-tls/health-checks

-
Output
HTTP GET https://172.31.26.18:10017/health: 200 OK Output: {"hostname":"ip-172-31-26-18","inside_function":"/opt/consul/custom-apps/tgw/tgw-count-tls.py['health']","response":"healthy"}
jameshartig commented 8 months ago

@vyanamandra your example used a HTTP health check and not TCP. I'm talking about a TCP health check with TCPUseTLS set to true. Please see the linked MR for consul in the issue description.

pgporada commented 1 week ago

@jameshartig I had no idea this project existed when I added TCP+TLS to consul itself. Sorry about that. I don't believe it's been plumbed through nomad either.