Closed ferhatvurucu closed 1 year ago
Hi @ferhatvurucu,
I have a few follow-up questions that may help reveal what's happening here:
Hi @jkirschner-hashicorp,
Thanks for the quick reply. It's just for service discovery at the moment. There is no auto-encrypt or auto-config. You may find the server agent configuration below.
{
"advertise_addr": "x.x.x.x",
"bind_addr": "x.x.x.x",
"bootstrap_expect": 3,
"client_addr": "0.0.0.0",
"datacenter": "dc1",
"node_name": "xxxx",
"retry_join": [
"provider=aws region=eu-west-1 tag_key=ServiceType tag_value=consul-server"
],
"server": true,
"encrypt": "xxxx",
"autopilot": {
"cleanup_dead_servers": true,
"last_contact_threshold": "200ms",
"max_trailing_logs": 250,
"server_stabilization_time": "10s",
"redundancy_zone_tag": "az",
"disable_upgrade_migration": false,
"upgrade_version_tag": ""
},
"ports": {
"grpc": 8502,
"grpc_tls": -1
},
"connect": {
"enabled": true
},
"ui": true
}
My understanding is that you'd only need the grpc
ports for:
If you're only using Consul for service discovery, (1) shouldn't apply to you. Do you have a multi-datacenter Consul deployment? If so, do you know if it's using WAN federation or cluster peering to connect the multiple datacenters?
It's also possible that the grpc port isn't needed at all. Was there a set of docs / tutorials you followed that suggested you might need that port? I'm wondering if there's a small docs improvement to be made here.
Was your ports
config as of 1.13.3 set like the below?
"ports": {
"grpc": 8502
},
Leaving some breadcrumbs for the future based on some initial digging into the code:
When tracking down what generates the SPIFFE ID related error message, I found that it attempts to match SPIFFE IDs against these regexes: https://github.com/hashicorp/consul/blob/c046d1a4d870639227baff629ff304a1b72deede/agent/connect/uri.go#L23-L30
Per your error message, the SPIFFE ID being matched against is: xxx.consul/agent/server/dc/dc1
.
That SPIFFE ID is closest to the format of spiffeIDServerRegexp
, but fails to match because it doesn't start exactly with /agent
... it instead has xxxx.consul
in front.
I have no particular experience with this area of the codebase, so I'm not sure what would cause a SPIFFE ID of xxx.consul/agent/server/dc/dc1
to be generated (and whether that's expected behavior). The above is just what I found digging through the code where that error message seems to be generated.
I've since seen indication that a SPIFFE ID in the form xxx.consul/agent/server/dc/dc1
is normal, so it's probable that my comments above are based on a misreading of the relevant code. I'll still leave the comments there in case they are relevant for future readers / investigation.
We are not actively using consul connect yet however it's a plan for the early future. Even if we disabled grpc tls, we still see the error message above. So with these settings, how can I enable consul connect and keep using tls disabled for now?
It seems this is related to https://github.com/hashicorp/nomad/issues/15360
Which Nomad version are you using? Per the Consul 1.14.x upgrade docs:
The changes to Consul service mesh in version 1.14 are incompatible with Nomad 1.4.2 and earlier. If you operate Consul service mesh using Nomad 1.4.2 or earlier, do not upgrade to Consul 1.14 until hashicorp/nomad#15266 is fixed.
We upgraded to Nomad 1.4.3 and Consul 1.14.2 respectively.
Were you on Nomad 1.4.3 at the time you reported this issue? Or just upgraded now?
It sounds like the former, but wanted to double-check.
We were already on Nomad 1.4.3.
I had the same error after upgrading, adding this in the agent config seems to fix it
peering {
enabled = false
}
Prior to Consul 1.14, cluster peering or Consul connect were disabled by default. A breaking change was made in Consul 1.14 that:
Cluster Peering is enabled by default. Cluster peering and WAN federation can coexist, so there is no need to disable cluster peering to upgrade existing WAN federated datacenters. To disable cluster peering nonetheless, set peering.enabled to false.
Hi,
I am upgrading my consul servers from 1.13.3 to 1.14.2 and facing an issue with the grpc_tls configuration. Port configuration has been changed as below however I am still able to see error logs about agent.cache and agent.server.cert-manager. We were already using grpc tls disabled and we added
grpc_tls: -1
configuration with the new version.I don't see any error when I disable connect feature in the configuration file.
Configuration
Journalctl logs
Consul info for Server
Operating system and Environment details
Ubuntu 22.04