Open celesteking opened 4 years ago
@celesteking The error message is referring to this setting which needs to be enabled on servers: https://www.consul.io/docs/agent/options.html#connect_enabled for Consul Connect (our service mesh feature) to work.
Can I ask which tutorial you were following?
I think the issue might be that the "getting started" guide is purposefully simplified and so uses an agent in -dev
mode which preconfigures several things including enabling Connect.
If you can tell us a bit more about your process - which guides did you follow, what did you try next, we can maybe debug and make sure that path is clearer for others in the future.
More details about Connect feature can be found here: https://www.consul.io/docs/connect/configuration.html
Oh there is also this guide: https://learn.hashicorp.com/consul/developer-mesh/connect-production which walks through the steps/prerequisites that get you from a kick-the-tyres demo mode to a real production setup.
I have followed the dev tutorial, now I'm onto prod tutorial. https://learn.hashicorp.com/consul/getting-started/join
I'm able to reproduce the issue by starting up client with the following leftover config from dev tutorial:
cat /consul/config/web.json
{"service":
{"name": "socat",
"port": 8181,
"connect": { "sidecar_service": {} }
}
}
After I remove the service with consul services deregister -id=socat
, it continues spitting those messages.
What is Connect and what is a mesh? The only thing that's used above is sidecar proxy feature. Just don't tell me you're using 3 different terms to describe 1 thing.
We don't need sidecar proxy feature, it's irrelevant for our setup.
Another thing I've noticed is that when you're doing things that supposed to fail, they don't fail, like deregistering a nonexistent service or consul kv del doesntexist
.
You should do what redis does -- return an error (0) or ENOENT, but don't return SUCCESS.
Also, the timing is wrong in all these docs pages. I can't be THAT stupid, but it takes me 10 minutes only to read the text and understand the pics on https://learn.hashicorp.com/consul/getting-started/services , not mentioning the time needed to actually mess around the commands and their output.
Take an average Polish or whatever Finland person you might find nearby, make sure he knows no redis or whatever, and ask him to follow the tutorial. Measure how long it really takes.
Another thing, the text Because there is no web service running, you will pretend to be the web service by talking to its proxy on the port that we specified (9191).
is complete nonsense. For me to act as web-service, I have to manually provide a listening service via socat or nc -vlp 9191
. But I'm not doing that. Instead, I'm connecting to the service, which means I'm acting as a client, not a server. Also all this sidecar chitchat could've been explained much better in modern terms, which is a VPN , a TUNNEL. Every kid round the block knows what a VPN or secure tunnel is. You're essentially establishing a secure tunnel in order to tunnel the data through. What's up with that "sidecar" terminology?...
Getting back to the 9191 service above, why would you even propose user using it? Can't that tunnel be made unidirectional, not bidirectional, so that the client log wouldn't spit it can't dial 9191 or 12001 (I don't remember clearly what was the message, but there WAS such a repeating message and it was confusing me a lot). Just don't tell me all this sidecar thing expects services to be running on both ends and can't be unidirectional...
Anyway, just my thoughts and I've only started... Still, all this is definitely better than shitshow most opensource (AKA student dorm) projects provide in regards to documentation. I'd say you've gone far and beyond.
I'm still getting original error even after following docs:
consul connect ca get-config
. And in client log:
2019/12/06 10:45:14 [ERR] consul: "ConnectCA.ConfigurationGet" RPC failed to server 192.168.112.4:8300: rpc error making call: Connect must be enabled in order to use this endpoint
servers were fed:
$ cat /consul/config/server.hcl
connect {
enabled = true
}
$ consul reload
Still, no dice.
Also, it seems like there's no way to view live server config, there's no such option. I'm not talking about cat
ing the config files, but about how server views the config, internally, live, with defaults, etc.
consul connect ca get-config
Hi there! I have just tried to enable connect on the leader node to enable also and got success. The guess is that you have to enable connect on every node.
Sorry, but there is a second evening with consul here )
If you are running server agents in non -dev
mode using Docker (https://github.com/docker-library/docs/tree/master/consul) then you can start the agents using initial server configuration using CONSUL_LOCAL_CONFIG
environment variable to feed in an initial configuration.
For example:
$ docker run \
--name my-server1 \
-e CONSUL_BIND_INTERFACE=eth0 \
-e 'CONSUL_LOCAL_CONFIG={"connect": {"enabled": true}}' \
consul agent \
-server \
-node my-server1 \
-data-dir /tmp/consul \
-join 172.17.0.7 \
-config-dir /consul/config
Above command would start a consul server agent in a Docker container by joining in to an existing cluster by specifying IP (172.17.0.7) of a node in that cluster.
The contents of CONSUL_LOCAL_CONFIG
environment variable gets mounted to /consul/config/local.json
.
All server agents can be restarted likewise one by one. Any server that boots up with this configuration and becomes a cluster leader will enable connect
and will bootstrap the built-in Certificate Authority (CA).
You can verify that connect
is enabled by running following command
$ consul connect ca get-config
It should return CA config JSON.
After upgrade to v1.11.3, I hit this problem too, the list of hosts for a service in Consul UI is not shown correctly. Is there any workaround other that enabling 'connect'?
Getting this continuously on client after following the tutorial and trying to switch into production mode:
This endlessly appearing entirely cryptic message doesn't help at all. I'm trying start fresh with client config, it's been connected to servers and servers are in sync. I've deleted
/consul/data
to get rid of stale crap.consul leave
on client should really delete all stale data, if any was left from previous operation. This is client, not server, it should self-destruct and let me start anew.Client info
``` agent: check_monitors = 0 check_ttls = 0 checks = 0 services = 1 build: prerelease = revision = 1200f25e version = 1.6.2 consul: acl = disabled known_servers = 2 server = false runtime: arch = amd64 cpu_count = 8 goroutines = 95 max_procs = 8 os = linux version = go1.12.13 serf_lan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 2 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 48 members = 3 query_queue = 0 query_time = 1 $ consul members Node Address Status Type Build Protocol DC Segment srv-1 192.168.112.3:8301 alive server 1.6.2 2 chi2Server info
``` agent: check_monitors = 0 check_ttls = 0 checks = 0 services = 0 build: prerelease = revision = 1200f25e version = 1.6.2 consul: acl = disabled bootstrap = false known_datacenters = 1 leader = true leader_addr = 192.168.112.3:8300 server = true raft: applied_index = 201 commit_index = 201 fsm_pending = 0 last_contact = 0 last_log_index = 201 last_log_term = 2 last_snapshot_index = 0 last_snapshot_term = 0 latest_configuration = [{Suffrage:Voter ID:5b9b3670-3d34-b352-f3bc-91dc904ac694 Address:192.168.112.3:8300} {Suffrage:Voter ID:cd2b23a8-db46-1a0d-07b9-9361475f8030 Address:192.168.112.4:8300}] latest_configuration_index = 1 num_peers = 1 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Leader term = 2 runtime: arch = amd64 cpu_count = 8 goroutines = 97 max_procs = 8 os = linux version = go1.12.13 serf_lan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 2 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 48 members = 3 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 18 members = 2 query_queue = 0 query_time = 1 ```