Closed treydock closed 4 years ago
After the cluster has booted, new members need to be added with initial-cluster-state: existing
. You must set the configuration provided by the sensuctl cluster member-add
tool, before launching the new member. This is crucial, as the member ID will be computed by hashing the member configuration.
We haven't released documentation on this yet, but it is similar to adding a new member to an etcd cluster.
There are some etcd docs you might find useful here: https://coreos.com/etcd/docs/latest/etcd-live-cluster-reconfiguration.html
@echlebek Maybe I'm doing something wrong but I'm now attempting adding a backend to a single node cluster using docker images that are used to test Puppet code.
I configure backend1:
---
listen-client-urls: http://0.0.0.0:2379
listen-peer-urls: http://0.0.0.0:2380
initial-cluster: backend1=http://172.17.0.2:2380
initial-advertise-peer-urls: http://172.17.0.2:2380
initial-cluster-state: new
name: backend1
[root@sensu_backend1 /]# sensuctl cluster member-list
ID Name Peer URLs Client URLs
────────────────── ────────── ──────────────────────── ─────────────────────
8e7d2048a91042b2 backend1 http://172.17.0.2:2380 http://0.0.0.0:2379
I then add a new member before starting the new member's sensu-backend (using Puppet, command used is in output)
Debug: Executing: '/usr/bin/sensuctl cluster member-add backend2 http://172.17.0.3:2380'
Info: Cluster member-add backend2: added member 1708047987ff93 to cluster
Info: Cluster member-add backend2:
Info: Cluster member-add backend2: ETCD_NAME="backend2"
Info: Cluster member-add backend2: ETCD_INITIAL_CLUSTER="backend2=http://172.17.0.3:2380,backend1=http://172.17.0.2:2380"
Info: Cluster member-add backend2: ETCD_INITIAL_CLUSTER_STATE="existing"
Notice: /Stage[main]/Main/Sensu_cluster_member[backend2]/ensure: created
Now sensuctl commands hang and have to be timed out:
[root@sensu_backend1 /]# timeout 30 sensuctl cluster member-list
[root@sensu_backend1 /]# echo $?
124
It's not until I configure and start sensu-backend on the new member that sensuctl commands stop hanging:
[root@sensu_backend1 /]# timeout 30 sensuctl cluster member-list
ID Name Peer URLs Client URLs
────────────────── ────────── ──────────────────────── ─────────────────────
1708047987ff93 backend2 http://172.17.0.3:2380 http://0.0.0.0:2379
8e7d2048a91042b2 backend1 http://172.17.0.2:2380 http://0.0.0.0:2379
Interesting, I haven't observed that behaviour with member-list before. However, my tests have been with a three-node cluster to start with, and then adding and removing members.
My previous tests with member-add have been adding a member to a cluster with two members, bringing it to three. In those cases, you could execute member-list before starting the new member without issue. The ID would simply not show for the member that wasn't started yet.
I'll try to reproduce your bug when I get back to work on Tuesday. Thanks for reporting this!
If etcd bases cluster availability and health on quorum then my guess is a single node cluster adding a new member puts the system in a state where quorum is lost. I'm unable to reproduce the problem using a similar setup as you described, going from 2 node to 3 node cluster.
I have noticed that establishing a 2 node cluster by doing one node at a time causes some problems. I do backend1 then backend2. The issues are when only backend1 is running. First I notice that curl http://127.0.0.1/info
fails, which I use in Puppet to verify sensu-backend is fully booted:
sensu_backend1 12:43:43$ curl http://127.0.0.1:8080/info
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
The sensu-backend service is running when curl
is attempted. Once backend2 node has sensu-backend started the curl works:
sensu_backend1 12:44:24$ curl http://127.0.0.1:8080/info
{"agentd":true,"apid":true,"dashboardd":true,"eventd":true,"keepalived":true,"message_bus":true,"pipelined":true,"schedulerd":true,"store":true}
Also seems sensu-agent on backend1 crashes after being started with only backend1 running. The service starts and stays running just fine once backend2 is brought online. I presume this is because the etcd cluster is not in a healthy state.
sensu_backend1 12:44:24$ systemctl status sensu-agent -l
● sensu-agent.service - The Sensu Agent process.
Loaded: loaded (/usr/lib/systemd/system/sensu-agent.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Sun 2018-08-05 16:43:43 UTC; 41s ago
Process: 2546 ExecStart=/usr/bin/sensu-agent start (code=exited, status=1/FAILURE)
Main PID: 2546 (code=exited, status=1/FAILURE)
Aug 05 16:43:42 sensu_backend1 systemd[1]: Unit sensu-agent.service entered failed state.
Aug 05 16:43:42 sensu_backend1 systemd[1]: sensu-agent.service failed.
Aug 05 16:43:43 sensu_backend1 systemd[1]: sensu-agent.service holdoff time over, scheduling restart.
Aug 05 16:43:43 sensu_backend1 systemd[1]: start request repeated too quickly for sensu-agent.service
Aug 05 16:43:43 sensu_backend1 systemd[1]: Failed to start The Sensu Agent process..
Aug 05 16:43:43 sensu_backend1 systemd[1]: Unit sensu-agent.service entered failed state.
Aug 05 16:43:43 sensu_backend1 systemd[1]: sensu-agent.service failed.
The environment producing above output is docker with no syslog so if the above isn't expected I can try and reproduce in a more complete environment where syslog logging is installed and functioning.
I've been able to add new members to the cluster. It's a little tricky, and have been doing it manually instead of with puppet, but I was finally able to get it working. @treydock I'd be willing to do a zoom call with you to see your process sometime next week during the workday. Feel free to DM me - it's the least I can do since you've jumped on my sensu-puppet repo issues so quickly :)
I'm going to go ahead and close this one as resolved via the Sensu Clustering guide: https://docs.sensu.io/sensu-go/latest/guides/clustering/
The original issue here was opened pre-GA, before we had documented the steps to cluster Sensu.
@csoleimani thanks for chiming in to help with this one! Let us know if you need any assistance with automated clustering and we'll be happy to help!
This may be premature given how new the changes to sensuctl are to support adding and removing members. I created a two member cluster using
backend.yml
and starting everything from scratch.backend1 bits:
backend2 bits
Member list looks good:
Where I'm running into trouble is adding a 3rd member to the cluster.
I tried two approaches in
backend.yml
Approach one- match existing cluster:
Approach two - no initial settings
In both cases I added the member:
What happens is the logs on backend3 have this:
I also notice that the
member-list
output seems incomplete, lacking name and client URLs for the new member.I'm unable to identify with
sensuctl
where those mismatched values come from likely because the ID of the cluster is not correct with--format json
which is covered in #1887I'm working with @ghoneycutt to automate with Puppet and wanted to evaluate supporting member add and remove with Puppet code.
Your Environment
sensuctl version 2.0.0-nightly#cf3305e, build cf3305eecc6cbd62ef9641ca6a5923d18615100c, built 2018-07-29T09:21:53+0000