vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.
MIT License
1.29k stars 352 forks source link

fail at TASK [sysctl : Setting kernel parameters] at ubuntu 22.04 #388

Closed fatmaAliGamal closed 10 months ago

fatmaAliGamal commented 10 months ago

TASK [sysctl : Setting kernel parameters] ** changed: [54.163.22.109] => (item={'name': 'net.ipv4.ip_nonlocal_bind', 'value': '1'}) changed: [3.95.217.76] => (item={'name': 'net.ipv4.ip_nonlocal_bind', 'value': '1'}) changed: [18.234.60.224] => (item={'name': 'net.ipv4.ip_nonlocal_bind', 'value': '1'}) changed: [54.163.22.109] => (item={'name': 'net.ipv4.ip_forward', 'value': '1'}) changed: [3.95.217.76] => (item={'name': 'net.ipv4.ip_forward', 'value': '1'}) changed: [18.234.60.224] => (item={'name': 'net.ipv4.ip_forward', 'value': '1'}) changed: [3.95.217.76] => (item={'name': 'net.ipv4.ip_local_port_range', 'value': '10000 65535'}) changed: [54.163.22.109] => (item={'name': 'net.ipv4.ip_local_port_range', 'value': '10000 65535'}) changed: [18.234.60.224] => (item={'name': 'net.ipv4.ip_local_port_range', 'value': '10000 65535'}) changed: [3.95.217.76] => (item={'name': 'net.core.netdev_max_backlog', 'value': '10000'}) changed: [54.163.22.109] => (item={'name': 'net.core.netdev_max_backlog', 'value': '10000'}) changed: [18.234.60.224] => (item={'name': 'net.core.netdev_max_backlog', 'value': '10000'}) changed: [3.95.217.76] => (item={'name': 'net.ipv4.tcp_max_syn_backlog', 'value': '8192'}) changed: [54.163.22.109] => (item={'name': 'net.ipv4.tcp_max_syn_backlog', 'value': '8192'}) changed: [18.234.60.224] => (item={'name': 'net.ipv4.tcp_max_syn_backlog', 'value': '8192'}) changed: [3.95.217.76] => (item={'name': 'net.core.somaxconn', 'value': '65535'}) changed: [54.163.22.109] => (item={'name': 'net.core.somaxconn', 'value': '65535'}) changed: [18.234.60.224] => (item={'name': 'net.core.somaxconn', 'value': '65535'}) changed: [3.95.217.76] => (item={'name': 'net.ipv4.tcp_tw_reuse', 'value': '1'}) changed: [54.163.22.109] => (item={'name': 'net.ipv4.tcp_tw_reuse', 'value': '1'}) changed: [18.234.60.224] => (item={'name': 'net.ipv4.tcp_tw_reuse', 'value': '1'}) fatal: [3.95.217.76]: FAILED! => {"msg": "Failed to connect to the host via ssh: "} ...ignoring fatal: [54.163.22.109]: FAILED! => {"msg": "Failed to connect to the host via ssh: "} ...ignoring fatal: [18.234.60.224]: FAILED! => {"msg": "Failed to connect to the host via ssh: "} ...ignoring

TASK [etcd : Make sure the unzip/tar packages are present] ***** fatal: [3.95.217.76]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ", "unreachable": true} fatal: [54.163.22.109]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ", "unreachable": true} fatal: [18.234.60.224]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ", "unreachable": true}

NO MORE HOSTS LEFT *****

PLAY RECAP ***** 18.234.60.224 : ok=8 changed=4 unreachable=1 failed=0 skipped=29 rescued=0 ignored=1
3.95.217.76 : ok=8 changed=4 unreachable=1 failed=0 skipped=29 rescued=0 ignored=1
54.163.22.109 : ok=8 changed=4 unreachable=1 failed=0 skipped=29 rescued=0 ignored=1

vitabaks commented 10 months ago

Make sure that you have specified only private addresses in the inventory, not public.

vitabaks commented 10 months ago

Use of Internal and External IP Addresses in Ansible Inventory: https://github.com/vitabaks/postgresql_cluster/issues/358#issuecomment-1580650911

fatmaAliGamal commented 10 months ago

i use ec2 at aws and i use private_ip_address ansible_host=public_ip_address but still error appears TASK [sysctl : Build a sysctl_conf dynamic variable] **** ok: [10.0.4.214] => (item=etcd_cluster) ok: [10.0.4.214] => (item=master) ok: [10.0.4.27] => (item=etcd_cluster) ok: [10.0.4.214] => (item=postgres_cluster) ok: [10.0.4.27] => (item=postgres_cluster) ok: [10.0.5.212] => (item=etcd_cluster) ok: [10.0.4.27] => (item=replica) ok: [10.0.5.212] => (item=postgres_cluster) ok: [10.0.5.212] => (item=replica)

TASK [sysctl : Setting kernel parameters] *** fatal: [10.0.4.27]: FAILED! => {"msg": "Failed to connect to the host via ssh: "} ...ignoring fatal: [10.0.4.214]: FAILED! => {"msg": "Failed to connect to the host via ssh: "} ...ignoring fatal: [10.0.5.212]: FAILED! => {"msg": "Failed to connect to the host via ssh: "} ...ignoring

TASK [etcd : Make sure the unzip/tar packages are present] ** fatal: [10.0.4.214]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ", "unreachable": true} fatal: [10.0.4.27]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ", "unreachable": true} fatal: [10.0.5.212]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ", "unreachable": true}

vitabaks commented 10 months ago

Problem with access to servers by ssh.

Try ansible all -m ping

fatmaAliGamal commented 10 months ago

10.0.4.97 | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/bin/python3" }, "changed": false, "ping": "pong" } 10.0.5.175 | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/bin/python3" }, "changed": false, "ping": "pong" } 10.0.4.224 | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/bin/python3" }, "changed": false, "ping": "pong" } before run ansible-playbook deploy_pgcluster.yml

after run it appear TASK [sysctl : Build a sysctl_conf dynamic variable] ***** ok: [10.0.4.97] => (item=etcd_cluster) ok: [10.0.4.97] => (item=master) ok: [10.0.4.224] => (item=etcd_cluster) ok: [10.0.4.97] => (item=postgres_cluster) ok: [10.0.4.224] => (item=postgres_cluster) ok: [10.0.5.175] => (item=etcd_cluster) ok: [10.0.4.224] => (item=replica) ok: [10.0.5.175] => (item=postgres_cluster) ok: [10.0.5.175] => (item=replica)

TASK [sysctl : Setting kernel parameters] **** fatal: [10.0.5.175]: FAILED! => {"msg": "Failed to connect to the host via ssh: "} ...ignoring fatal: [10.0.4.224]: FAILED! => {"msg": "Failed to connect to the host via ssh: "} ...ignoring fatal: [10.0.4.97]: FAILED! => {"msg": "Failed to connect to the host via ssh: "} ...ignoring

when i run ansible all -m ping again 10.0.4.97 | UNREACHABLE! => { "changed": false, "msg": "Failed to connect to the host via ssh: Connection closed by 184.73.144.179 port 22", "unreachable": true } 10.0.4.224 | UNREACHABLE! => { "changed": false, "msg": "Failed to connect to the host via ssh: Connection closed by 3.89.251.124 port 22", "unreachable": true } An exception occurred during task execution. To see the full traceback, use -vvv. The error was: MemoryError 10.0.5.175 | FAILED! => { "ansible_facts": { "discovered_interpreter_python": "/usr/bin/python3" }, "changed": false, "module_stderr": "Shared connection to 34.228.22.118 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/home/ubuntu/.ansible/tmp/ansible-tmp-1688324331.0731466-102147-27517743771944/AnsiballZ_ping.py\", line 107, in \r\n File \"/home/ubuntu/.ansible/tmp/ansible-tmp-1688324331.0731466-102147-27517743771944/AnsiballZ_ping.py\", line 28, in _ansiballz_main\r\n import zipfile\r\n File \"/usr/lib/python3.10/zipfile.py\", line 19, in \r\n import pathlib\r\n File \"\", line 1027, in _find_and_load\r\n File \"\", line 1006, in _find_and_load_unlocked\r\n File \"\", line 688, in _load_unlocked\r\n File \"\", line 879, in exec_module\r\n File \"\", line 975, in get_code\r\n File \"\", line 1074, in get_data\r\nMemoryError\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1 }

note i use ec2 aws ubuntu 22.04 instance_type = "t2.micro"

vitabaks commented 10 months ago

note i use ec2 aws ubuntu 22.04 instance_type = "t2.micro"

This is the problem the server is too small memory resources to service ansible modules.

Try a server with at least 2 or 4 GiB of memory

fatmaAliGamal commented 10 months ago

i use 8 GB of memory but i appear an other error fatal: [10.0.4.132]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'postgres_cluster_nodes' is undefined\n\nThe error appears to be in '/mnt/396A486035A35D5E/soa/task-cluster-db/soa-db/postgresql_cluster/roles/deploy-finish/tasks/main.yml': line 162, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: PostgreSQL Cluster connection info\n ^ here\n"} ...ignoring

vitabaks commented 10 months ago

'postgres_cluster_nodes' is undefined

This may be the reason that set_fact was not executed for the variable "postgres_cluster_nodes", which is based on the list of hosts (inventory_hostname) defined in the postgres_cluster group. Code here

Please show the result of the "Create list of nodes" task

And please show your inventory file.

fatmaAliGamal commented 10 months ago

TASK [deploy-finish : Virtual IP Address (VIP) info] ***** skipping: [10.0.4.27] skipping: [10.0.4.246] skipping: [10.0.5.227]

TASK [deploy-finish : Create list of nodes] ** fatal: [10.0.4.27]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'balancers'\n\nThe error appears to be in '/mnt/396A486035A35D5E/soa/task-cluster-db/soa-db/postgresql_cluster/roles/deploy-finish/tasks/main.yml': line 128, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n- block: # if cluster_vip is not defined\n - name: Create list of nodes\n ^ here\n"} ...ignoring

TASK [deploy-finish : PostgreSQL Cluster connection info] **** skipping: [10.0.4.27]

TASK [deploy-finish : PostgreSQL Cluster connection info] **** skipping: [10.0.4.27]

TASK [deploy-finish : PostgreSQL Cluster connection info] **** fatal: [10.0.4.27]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'postgres_cluster_nodes' is undefined\n\nThe error appears to be in '/mnt/396A486035A35D5E/soa/task-cluster-db/soa-db/postgresql_cluster/roles/deploy-finish/tasks/main.yml': line 162, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: PostgreSQL Cluster connection info\n ^ here\n"} ...ignoring

TASK [deploy-finish : PostgreSQL Cluster connection info] **** skipping: [10.0.4.27]

TASK [deploy-finish : PostgreSQL Cluster connection info] **** skipping: [10.0.4.27]

PLAY RECAP *** 10.0.4.246 : ok=94 changed=61 unreachable=0 failed=0 skipped=308 rescued=0 ignored=0
10.0.4.27 : ok=106 changed=62 unreachable=0 failed=0 skipped=326 rescued=0 ignored=2
10.0.5.227 : ok=94 changed=61 unreachable=0 failed=0 skipped=308 rescued=0 ignored=0
localhost : ok=0 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0

hosts.ini

[etcd_cluster] 10.0.4.27 ansible_host=3.87.225.7 10.0.4.246 ansible_host=44.211.199.162 10.0.5.227 ansible_host=54.157.19.230 [master] 10.0.4.27 ansible_host=3.87.225.7 [replica] 10.0.4.246 ansible_host=44.211.199.162 10.0.5.227 ansible_host=54.157.19.230 [postgres_cluster:children] master replica [all:vars] ansible_connection='ssh' ansible_ssh_port='22' ansible_ssh_user=ubuntu ansible_ssh_private_key_file=./postgres-db-key which ip of server i put at cluster_vip= "" proxy_env: {}

vitabaks commented 10 months ago

@fatmaAliGamal Why did you decide to remove the required group "balancers" from the inventory file?

Please use the suggested version of the inventory file.

P.S. I will make the "balancers" group optional.

fatmaAliGamal commented 10 months ago

because i create it dynamic when i create 3 ec2 what is wrong at inventory this is only difference between them

PostgreSQL nodes

[master] 10.128.64.140 hostname=pgnode01 postgresql_exists=false

[replica] 10.128.64.142 hostname=pgnode02 postgresql_exists=false 10.128.64.143 hostname=pgnode03 postgresql_exists=false hostname and postgresql_exists=false this must after 10.0.5.227 ansible_host=54.157.19.230, if there must be add , any code at playbook you change hostname P.S. I will make the "balancers" group optional ==> i use type B so i can ignore this right or not.

vitabaks commented 10 months ago

@fatmaAliGamal Fixed https://github.com/vitabaks/postgresql_cluster/commit/16c63c5fda68edf40c158b88d89b6ad0f5c712ee

fatmaAliGamal commented 10 months ago

I will try it now but you can reply please for this question i can't know this is optional or mandatory hostname and postgresql_exists=false this must after 10.0.5.227 ansible_host=54.157.19.230, if there must be add , any code at playbook you change hostname

vitabaks commented 10 months ago

This is optional, you don't have to define these variables in the inventory

# "postgresql_exists='true'" if PostgreSQL is already exists and running
# "hostname=" variable is optional (used to change the server name)
fatmaAliGamal commented 10 months ago

thanks for your support but when i test [Type B] PostgreSQL High-Availability only using sudo systemctl stop postgresql.service at primary server no replica change from secondary to primary to cover failover

vitabaks commented 10 months ago

no replica change from secondary to primary to cover failover

Please create a separate issue and describe the details, and please attach Patroni logs.

fatmaAliGamal commented 10 months ago

okey thanks again for your faster response