Closed zuhataslan closed 10 months ago
Hi @zuhataslan
please give more data from the etcd log
sudo journalctl -u etcd -n 100 --output=short-precise
I think I found the problem although I don't yet understand the cause. Every time I run the playbook, the hostname of node3 is set to same hostname of node2.
May 30 00:54:29.892361 node02 bash[359834]: {"level":"fatal","ts":"2023-05-30T00:54:29.892+0200","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"--initial-cluster has node02=http://192.168.2.35:2380 but missing from --initial-advertise-peer-urls=http://192.168.2.36:2380 (len([\"http://192.168.2.36:2380\"]) != len([\"http://192.168.2.35:2380\" \"http://192.168.2.36:2380\"]))","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:32\nruntime.main\n\truntime/proc.go:255"}
Please check the hostname
variables in inventory
file
they must be unique for each host.
@zuhataslan did you to deploy an etcd cluster?
I have the same problem:
Jun 05 16:21:24.347675 Project-data-s1-v1 systemd[1]: Stopped Etcd Server.
Jun 05 16:21:24.349386 Project-data-s1-v1 systemd[1]: Starting Etcd Server...
Jun 05 16:21:24.372370 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_ADVERTISE_CLIENT_URLS","variable-value":"http://Project-data-s1-v1:2379"}
Jun 05 16:21:24.372370 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_DATA_DIR","variable-value":"/var/lib/etcd"}
Jun 05 16:21:24.372370 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_ELECTION_TIMEOUT","variable-value":"5000"}
Jun 05 16:21:24.372370 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_HEARTBEAT_INTERVAL","variable-value":"1000"}
Jun 05 16:21:24.372960 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_ADVERTISE_PEER_URLS","variable-value":"http://Project-data-s1-v1:2380"}
Jun 05 16:21:24.372960 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER","variable-value":"Project-data-s1-v1=http://Project-data-s1-v1:2380,Project-data-s2-v1=http://Project-data-s2-v1:2380,Project-data-s3-v1=http://Project-data-s3-v1:2380"}
Jun 05 16:21:24.372960 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_STATE","variable-value":"new"}
Jun 05 16:21:24.372960 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_TOKEN","variable-value":"etcd-postgres-cluster"}
Jun 05 16:21:24.372960 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_ELECTION_TICK_ADVANCE","variable-value":"false"}
Jun 05 16:21:24.372960 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_LISTEN_CLIENT_URLS","variable-value":"http://Project-data-s1-v1:2379,http://127.0.0.1:2379"}
Jun 05 16:21:24.372960 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_LISTEN_PEER_URLS","variable-value":"http://Project-data-s1-v1:2380"}
Jun 05 16:21:24.372960 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_NAME","variable-value":"Project-data-s1-v1"}
Jun 05 16:21:24.372960 Project-data-s1-v1 bash[3723593]: {"level":"info","ts":"2023-06-05T16:21:24.372+0300","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["/usr/local/bin/etcd"]}
Jun 05 16:21:24.373301 Project-data-s1-v1 bash[3723593]: {"level":"warn","ts":"2023-06-05T16:21:24.372+0300","caller":"etcdmain/etcd.go:75","msg":"failed to verify flags","error":"expected IP in URL for binding (http://Project-data-s1-v1:2380)"}
Jun 05 16:21:24.374122 Project-data-s1-v1 systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Jun 05 16:21:24.374550 Project-data-s1-v1 systemd[1]: etcd.service: Failed with result 'exit-code'.
Jun 05 16:21:24.375251 Project-data-s1-v1 systemd[1]: Failed to start Etcd Server.
Jun 05 16:21:24.597020 Project-data-s1-v1 systemd[1]: etcd.service: Scheduled restart job, restart counter is at 5.
Jun 05 16:21:24.597520 Project-data-s1-v1 systemd[1]: Stopped Etcd Server.
Jun 05 16:21:24.597877 Project-data-s1-v1 systemd[1]: etcd.service: Start request repeated too quickly.
Jun 05 16:21:24.598096 Project-data-s1-v1 systemd[1]: etcd.service: Failed with result 'exit-code'.
Jun 05 16:21:24.598478 Project-data-s1-v1 systemd[1]: Failed to start Etcd Server.
what is "Project-data-s1-v"?
Please make sure that you have specified IP addresses in the inventory file.
The specified IP addresses will be used to listen by the cluster components.
Example:
[etcd_cluster]
10.128.64.140
10.128.64.142
10.128.64.143
Damn, I have so:
Project-data-s1-v1 ansible_ssh_host=123.123.123.121
Project-data-s2-v1 ansible_ssh_host=123.123.123.122
Project-data-s3-v1 ansible_ssh_host=123.123.123.123
ip address or domain name must be specified.
Okay, but how do I specify so that ansible goes by external IP, but configures everything by internal IP?
Good question, I'll think about how to implement it. In the meantime, specify only the internal ip. Use a server to run ansible from the same private network.
You use the specified address in the inventory in the configuration files, but this is inconvenient for such cases when access to the configured hosts is only available via external IP addresses, but the cluster hosts have an internal network
Good question, I'll think about how to implement it. In the meantime, specify only the internal ip. Use a server to run ansible from the same private network.
Thank you, you're cool
you can create PR to improve this part.
better solution below
a possible solution, for people which need to use external IPs in the inventory, could look like this:
1. use server names in your inventory
2. on your ansible server, add external ips for the inventory server names to your local /etc/hosts
3. define the local ips in the
etc_hosts
in system.yaml
4. use this etcd.conf.j2
content:
ETCD_NAME="{{ ansible_hostname }}"
ETCD_LISTEN_CLIENT_URLS="http://{{ ansible_default_ipv4.address }}:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://{{ ansible_default_ipv4.address }}:2379"
ETCD_LISTEN_PEER_URLS="http://{{ ansible_default_ipv4.address }}:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://{{ ansible_default_ipv4.address }}:2380"
ETCD_INITIAL_CLUSTER_TOKEN="{{ etcd_cluster_name }}"
ETCD_INITIAL_CLUSTER="{% for host in groups['etcd_cluster'] %}{{ hostvars[host]['ansible_hostname'] }}=http://{{ hostvars[host]['ansible_default_ipv4']['address'] }}:2380{% if not loop.last %},{% endif %}{% endfor %}"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_DATA_DIR="{{ etcd_data_dir }}"
ETCD_ELECTION_TIMEOUT="5000"
ETCD_HEARTBEAT_INTERVAL="1000"
ETCD_INITIAL_ELECTION_TICK_ADVANCE="false"
ETCD_ENABLE_V2="true"
It has been identified that there may be some confusion when it comes to using both internal and external IP addresses within the Ansible inventory. Here is some clarification:
In Ansible, the inventory_hostname
represents the hostname within your configuration. This value can be referenced within your Ansible playbooks and roles. On the other hand, ansible_host
is used to specify the IP address or domain name where Ansible should establish a connection to the remote host.
When setting these values in the format private_ip_address ansible_host=public_ip_address
, Ansible will:
Use the private_ip_address internally within its playbooks and roles (the IP addresses specified as inventory_hostname will be used by the cluster components for listening), and connect to the host via the public_ip_address.
Example:
[etcd_cluster]
10.128.64.140 ansible_host=34.72.80.145
10.128.64.142 ansible_host=35.123.45.67
10.128.64.143 ansible_host=36.192.89.10
This configuration is useful when the cluster components need to communicate over internal IP addresses, but Ansible commands need to be run over the public IP address.
using inventory_hostname
was a simple and fast way to implement listening settings by cluster components on the specified network.
It may be worth abandoning this method in favor of the bind_address
variable (similar to consul_bind_address
) for the interface designated in the interface
variable (similar to vip_interface
of consul_iface
)
Use of Internal and External IP Addresses in Ansible Inventory
It has been identified that there may be some confusion when it comes to using both internal and external IP addresses within the Ansible inventory. Here is some clarification:
In Ansible, the
inventory_hostname
represents the hostname within your configuration. This value can be referenced within your Ansible playbooks and roles. On the other hand,ansible_host
is used to specify the IP address or domain name where Ansible should establish a connection to the remote host.When setting these values in the format
private_ip_address ansible_host=public_ip_address
, Ansible will:Use the private_ip_address internally within its playbooks and roles (the IP addresses specified as inventory_hostname will be used by the cluster components for listening), and connect to the host via the public_ip_address.
Example:
[etcd_cluster] 10.128.64.140 ansible_host=34.72.80.145 10.128.64.142 ansible_host=35.123.45.67 10.128.64.143 ansible_host=36.192.89.10
This configuration is useful when the cluster components need to communicate over internal IP addresses, but Ansible commands need to be run over the public IP address.
thx, I just tested this proposal, and it did work fine without any further configuration needed !
[etcd_cluster] 10.0.4.28 ansible_host=18.205.150.32 10.0.4.180 ansible_host=18.208.144.178 10.0.5.139 ansible_host=18.207.156.126
[master] 10.0.4.28 ansible_host=18.205.150.32
[replica] 10.0.4.180 ansible_host=18.208.144.178 10.0.5.139 ansible_host=18.207.156.126
but still error appears TASK [sysctl : Setting kernel parameters] *** fatal: [10.0.4.28]: FAILED! => {"msg": "Failed to connect to the host via ssh: "} ...ignoring fatal: [10.0.4.180]: FAILED! => {"msg": "Failed to connect to the host via ssh: "} ...ignoring fatal: [10.0.5.139]: FAILED! => {"msg": "Failed to connect to the host via ssh: "} ...ignoring
TASK [etcd : Make sure the unzip/tar packages are present] ** fatal: [10.0.4.28]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ", "unreachable": true} fatal: [10.0.4.180]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ", "unreachable": true} fatal: [10.0.5.139]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ", "unreachable": true}
NO MORE HOSTS LEFT **
PLAY RECAP **
10.0.4.180 : ok=6 changed=2 unreachable=1 failed=0 skipped=31 rescued=0 ignored=1
10.0.4.28 : ok=6 changed=2 unreachable=1 failed=0 skipped=31 rescued=0 ignored=1
10.0.5.139 : ok=6 changed=2 unreachable=1 failed=0 skipped=31 rescued=0 ignored=1
I use aws ec2 ubuntu 22.04
Hi, when running playbook I get the following error for each host:
FAILED! => {"changed": false, "msg": "Unable to start service etcd: Job for etcd.service failed because the control process exited with error code.\nSee \"systemctl status etcd.service\" and \"journalctl -xe\" for details.\n"}
Output of journalctl: `● etcd.service - Etcd Server Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2023-05-29 16:01:57 CEST; 17min ago Process: 340151 ExecStart=/bin/bash -c GOMAXPROCS=$(nproc) /usr/local/bin/etcd (code=exited, status=1/FAILURE) Main PID: 340151 (code=exited, status=1/FAILURE)
etcd.service: Service RestartSec=100ms expired, scheduling restart. systemd[1]: etcd.service: Scheduled restart job, restart counter is at 5. systemd[1]: Stopped Etcd Server. systemd[1]: etcd.service: Start request repeated too quickly. systemd[1]: etcd.service: Failed with result 'exit-code'`
Tried manually restart service but same error. But If I manually run /bin/bash -c "GOMAXPROCS=$(nproc) /usr/local/bin/etcd", I don't get any errors and it 'seems' to work:
{"level":"info","ts":"2023-05-29T21:45:14.443+0200","caller":"etcdserver/server.go:2062","msg":"published local member to cluster through raft","local-member-id":"8e9e05c52164694d","local-member-attributes":"{Name:default ClientURLs:[http://localhost:2379]}","request-path":"/0/members/8e9e05c52164694d/attributes","cluster-id":"cdf818194e3a8c32","publish-timeout":"7s"} {"level":"info","ts":"2023-05-29T21:45:14.444+0200","caller":"embed/serve.go:100","msg":"ready to serve client requests"} {"level":"info","ts":"2023-05-29T21:45:14.444+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"} {"level":"info","ts":"2023-05-29T21:45:14.444+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
Any ideas?
Remote host: