Closed locinus closed 10 months ago
Hi @locinus ! This is a check done by Nomad to make sure the consul client agent is configured correctly. When using connect
, it is required to activate the grpc
port on Consul client agents. The error being reported indicates there are no Consul client agents available with the grpc port activated (which is supported by your /etc/consul.d/consul.hcl
file).
I realize the Consul docs[1] don't really make that clear, but the learn guide[2] does:
Client agents only need to configure the gRPC port.
[1] https://www.consul.io/docs/connect/configuration#agent-configuration [2] https://learn.hashicorp.com/tutorials/consul/service-mesh-with-envoy-proxy?in=consul/developer-mesh#enable-connect-and-grpc
Nomad could at least document this check and point folks in the right direction
I think I had similar issue where enabling just the port in the consul client was not enough and connect enabled had to be set to true on clients as well for Nomad constraint to work. Consul docs state that connect enabled only needed on server mode agents.
It still happens to me in nomad 1.3.1 when I run it in WSL2. Any process for this?
@thangchung did you activate the grpc
port on the Consul clients as described here: https://github.com/hashicorp/nomad/issues/12111#issuecomment-1049079451 ?
Just encountered this, we have both connect and the grpc port enabled, and we're still getting this constraint failure.
Nomad version: 1.4.4 Consul version: 1.15.0
This started happening after upgrading to the above versions
Digging into this a bit more, our consul client configuration looks like this:
{
"addresses": {
"http": "0.0.0.0"
},
"bind_addr": "10.51.1.13",
"connect": {
"enabled": true
},
"data_dir": "/var/lib/consul",
"enable_local_script_checks": true,
"enable_script_checks": true,
"encrypt": ...,
"ports": {
"grpc": 8502,
"http": 8500,
"https": 8501
},
"retry_interval": "15s",
"retry_join": [
"10.51.1.10",
"10.51.1.11",
"10.51.1.12"
],
"retry_max": 3,
"server": false,
"tls": {
"https": {
"ca_file": "/etc/ssl/certs/ca.pem",
"cert_file": "/etc/ssl/certs/cert.pem",
"key_file": "/etc/ssl/certs/cert-key.pem"
},
"internal_rpc": {
"ca_file": "/etc/ssl/certs/ca.pem",
"cert_file": "/etc/ssl/certs/cert.pem",
"key_file": "/etc/ssl/certs/cert-key.pem"
}
},
"ui_config": {
"enabled": true
}
}
However, nomad node status -verbose ...
returns:
...
Attributes
consul.connect = true
consul.datacenter = dc1
consul.ft.namespaces = false
consul.grpc = -1
consul.server = false
consul.sku = oss
consul.version = 1.15.0
...
(yes, tried bouncing all services)
We did have some issues with grpc+tls and envoy when upgrading and we had to change the consul client configuration, perhaps it's related?
Ok fixed. To anyone encountering this issue, the configuration that worked for us:
...
"ports": {
"grpc": 8502,
"grpc_tls": 8503,
"http": 8500,
"https": 8501
},
...
"tls": {
"grpc": {
"ca_file": "/etc/ssl/certs/ca.pem",
"cert_file": "/etc/ssl/certs/cert.pem",
"key_file": "/etc/ssl/certs/cert-key.pem"
},
"https": {
"ca_file": "/etc/ssl/certs/ca.pem",
"cert_file": "/etc/ssl/certs/cert.pem",
"key_file": "/etc/ssl/certs/cert-key.pem"
},
"internal_rpc": {
"ca_file": "/etc/ssl/certs/ca.pem",
"cert_file": "/etc/ssl/certs/cert.pem",
"key_file": "/etc/ssl/certs/cert-key.pem"
}
},
...
In other words we had to enable both non-tls'd grpc (envoy throws errors otherwise) AND tls'd grpc (nomad reports consul.grpc=-1 otherwise).
Correction, with the above configuration envoy starts throwing again:
[2023-03-17 10:18:34.733][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 1622s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
so we're back at square one
Figured out a working configuration. So it seems nomad marks consul with consul.grpc = -1
unless TLS is configured for grpc. If we configure consul grpc with TLS then envoy by default will try to connect to it without TLS, printing that transport error. To make envoy use TLS correctly we had to set the following env vars for the nomad client :
CONSUL_HTTP_SSL = "true";
CONSUL_CACERT = "/etc/ssl/certs/ca.pem";
CONSUL_GRPC_CACERT = "/etc/ssl/certs/ca.pem";
CONSUL_GRPC_ADDRESS = "127.0.0.1:8503";
CONSUL_CLIENT_CERT = "/etc/ssl/certs/cert.pem";
CONSUL_CLIENT_KEY = "/etc/ssl/certs/cert-key.pem";
maybe not all of these are required, but this worked
I was experiencing the same problem. In my case we managed to solve it by specifying the ports block in /etc/consul.d/server.hcl
as follows:
ports {
grpc_tls = 8503
grpc = 8502
http = 8500
https = 8501
}
We also had to set the following environment variable to true:
CONSUL_HTTP_SSL=true
I hope this helps you and works for you as well.
Doing a little issue cleanup. This is currently documented in the Connect Prerequisites documentation and the consul.grpc_address
config documentation.
Nomad version
1.2.3
Operating system and Environment details
Debian 10
Issue
While trying to setup a sidecar service, such as the countdash of the official example, using the Docker driver, we run into this failure in deployment:
* Constraint "${attr.consul.grpc} > 0": 2 nodes excluded by filter
We are, however, able to deploy a job with no sidecar successfully.
We looked for configuration errors but are currently clueless to the origin of this constraint failure. Google has no reference to such error. Any help greatly appreciated!
Config files
/etc/consul.d/consul.hcl
/etc/consul.d/server.hcl
/etc/systemd/system/consul.service
/etc/nomad.d/nomad.hcl
/etc/nomad.d/server.hcl
/etc/systemd/system/nomad.service
/etc/nomad.d/client.hcl
/etc/nomad.d/docker.hcl