vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.
MIT License
1.29k stars 353 forks source link

Problem with starting patroni service in closed infrastructure #88

Closed pavelkogen closed 3 years ago

pavelkogen commented 3 years ago

Hi everyone!

I'm very grateful to vitabaks on building and working on such a large project to automate the creation of a postgres cluster! 🥇 I use postgresql_cluster role for deployment productivity Postgres-cluster in closed (offline) infrastructure. Unfortunately, I ran into a problem of starting the patroni service.

I'm used default pip packages from RedHat.yml file:

pip_package_file: "pip-19.2.3.tar.gz"  # https://pypi.org/project/pip/#files
patroni_pip_requirements_file:
  - "setuptools-41.2.0.zip"  # https://pypi.org/project/setuptools/#files
  - "setuptools_scm-3.3.3.tar.gz"  # https://pypi.org/project/setuptools-scm/#files
  - "urllib3-1.24.3.tar.gz"  # https://pypi.org/project/urllib3/1.24.3/#files
  - "boto-2.49.0.tar.gz"  # https://pypi.org/project/boto/#files # (interfaces to Amazon Web Services)
  - "PyYAML-5.1.2.tar.gz"  # https://pypi.org/project/PyYAML/#files
  - "chardet-3.0.4.tar.gz"  # https://pypi.org/project/chardet/#files # (required for "requests")
  - "idna-2.8.tar.gz"  # https://pypi.org/project/idna/#files    # (required for "requests")
  - "certifi-2019.9.11.tar.gz"  # https://pypi.org/project/certifi/#files # (required for "requests")
  - "requests-2.22.0.tar.gz"  # https://pypi.org/project/requests/#files
  - "six-1.12.0.tar.gz"  # https://pypi.org/project/six/#files
  - "kazoo-2.6.1.tar.gz"  # https://pypi.org/project/kazoo/#files
  - "dnspython-1.16.0.zip"  # https://pypi.org/project/dnspython/#files # (required for "python-etcd")
  - "python-etcd-0.4.5.tar.gz"  # https://pypi.org/project/python-etcd/#files
  - "Click-7.0.tar.gz"  # https://pypi.org/project/click/#files
  - "prettytable-0.7.2.tar.gz"  # https://pypi.org/project/PrettyTable/#files
  - "pytz-2019.2.tar.gz"  # https://pypi.org/project/pytz/#files # (required for "tzlocal")
  - "tzlocal-2.0.0.tar.gz"  # https://pypi.org/project/tzlocal/#files
  - "wheel-0.33.6.tar.gz"  # https://pypi.org/project/wheel/#files
  - "python-dateutil-2.8.0.tar.gz"  # https://pypi.org/project/python-dateutil/#files
  - "psutil-5.6.3.tar.gz"  # https://pypi.org/project/psutil/#files
  - "cdiff-1.0.tar.gz"  # https://pypi.org/project/cdiff/#files
  - "prettytable-0.7.2.tar.gz"
patroni_pip_package_file:
  - "patroni-1.6.0.tar.gz"  # https://pypi.org/project/patroni/#files

# ( if patroni_installation_type: "rpm" and installation_method: "file" )
patroni_rpm_package_file: "patroni-1.6.5-1.rhel7.x86_64.rpm"  # (package for RHEL/CentOS 7) https://github.com/cybertec-postgresql/patroni-packaging/releases/

Role deployment ends at a moment:

TASK [Start patroni service on the Master server] changed: [XX.XX.XX.XX]

TASK [patroni : Wait for port 8008 to become open on the host] fatal: [XX.XX.XX.XX]: FAILED! => {"changed": false, "elapsed": 120, "msg": "Timeout when waiting for 193.48.3.65:8008"}

NO MORE HOSTS LEFT 
PLAY RECAP XX.XX.XX.XX                : ok=73   changed=7    unreachable=0    failed=1    skipped=269  rescued=0    ignored=0   
YY.YY.YY.YY (2th node)                : ok=70   changed=6    unreachable=0    failed=0    skipped=266  rescued=0    ignored=0   
NN.NN.NN.NN               : ok=70   changed=6    unreachable=0    failed=0    skipped=266  rescued=0    ignored=0   
localhost                  : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   

Going to 2th node and see:

{RHEL7.9}{N/A}{2th_node}[root@~]$ journalctl -u patroni
-- Logs begin at Fri 2021-01-15 18:46:50 MSK, end at Mon 2021-01-18 12:50:50 MSK. --
Jan 18 12:47:26 2th_node systemd[1]: Started Runners to orchestrate a high-availability PostgreSQL - patroni.
Jan 18 12:47:26 2th_node patroni[21439]: Traceback (most recent call last):
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/bin/patroni", line 10, in <module>
Jan 18 12:47:26 2th_node patroni[21439]: sys.exit(main())
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib/python3.6/site-packages/patroni/__init__.py", line 196, in main
Jan 18 12:47:26 2th_node patroni[21439]: return patroni_main()
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib/python3.6/site-packages/patroni/__init__.py", line 158, in patroni_main
Jan 18 12:47:26 2th_node patroni[21439]: patroni = Patroni()
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib/python3.6/site-packages/patroni/__init__.py", line 26, in __init__
Jan 18 12:47:26 2th_node patroni[21439]: self.config = Config()
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 78, in __init__
Jan 18 12:47:26 2th_node patroni[21439]: self._local_configuration = self._load_config_file()
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib/python3.6/site-packages/patroni/config.py", line 108, in _load_config_file
Jan 18 12:47:26 2th_node patroni[21439]: config = yaml.safe_load(f)
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib64/python3.6/site-packages/yaml/__init__.py", line 162, in safe_load
Jan 18 12:47:26 2th_node patroni[21439]: return load(stream, SafeLoader)
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib64/python3.6/site-packages/yaml/__init__.py", line 114, in load
Jan 18 12:47:26 2th_node patroni[21439]: return loader.get_single_data()
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib64/python3.6/site-packages/yaml/constructor.py", line 41, in get_single_data
Jan 18 12:47:26 2th_node patroni[21439]: node = self.get_single_node()
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib64/python3.6/site-packages/yaml/composer.py", line 36, in get_single_node
Jan 18 12:47:26 2th_node patroni[21439]: document = self.compose_document()
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib64/python3.6/site-packages/yaml/composer.py", line 58, in compose_document
Jan 18 12:47:26 2th_node patroni[21439]: self.get_event()
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib64/python3.6/site-packages/yaml/parser.py", line 118, in get_event
Jan 18 12:47:26 2th_node patroni[21439]: self.current_event = self.state()
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib64/python3.6/site-packages/yaml/parser.py", line 193, in parse_document_end
Jan 18 12:47:26 2th_node patroni[21439]: token = self.peek_token()
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib64/python3.6/site-packages/yaml/scanner.py", line 129, in peek_token
Jan 18 12:47:26 2th_node patroni[21439]: self.fetch_more_tokens()
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib64/python3.6/site-packages/yaml/scanner.py", line 223, in fetch_more_tokens
Jan 18 12:47:26 2th_node patroni[21439]: return self.fetch_value()
Jan 18 12:47:26 2th_node patroni[21439]: File "/usr/local/lib64/python3.6/site-packages/yaml/scanner.py", line 579, in fetch_value
Jan 18 12:47:26 2th_node patroni[21439]: self.get_mark())
Jan 18 12:47:26 2th_node patroni[21439]: yaml.scanner.ScannerError: mapping values are not allowed here
Jan 18 12:47:26 2th_node patroni[21439]: in "/etc/patroni/patroni.yml", line 3, column 4
Jan 18 12:47:26 2th_node systemd[1]: patroni.service: main process exited, code=exited, status=1/FAILURE
Jan 18 12:47:26 2th_node systemd[1]: Unit patroni.service entered failed state.
Jan 18 12:47:26 2th_node systemd[1]: patroni.service failed.

What could be the reason for this?

Red Hat 7.9 with all latest updates. All role settings by default.

vitabaks commented 3 years ago

I'm used default pip packages from RedHat.yml file

Are you using installation_method: "file" or "repo" ? patroni_installation_type: "pip" or "rpm"?

Jan 18 12:47:26 2th_node patroni[21439]: yaml.scanner.ScannerError: mapping values are not allowed here Jan 18 12:47:26 2th_node patroni[21439]: in "/etc/patroni/patroni.yml", line 3, column 4

Please share your patroni.yml

pavelkogen commented 3 years ago

Are you using installation_method: "file" or "repo" ? patroni_installation_type: "pip" or "rpm"?

Yes, of course. I'm using installation method from file.

Solution: since I was using postgres version 12, I did not have the postgresql12-devel package installed due to the missing llvm-toolset-7-clang dependency (this package is missing from our repository). I chose to use Postgres version 10 and it worked.

However, I ran into another problem. Cluster installation stops at the moment of restarting the vip-manager service and waiting for a response from the VIP address:

RUNNING HANDLER [vip-manager : Restart vip-manager service] 
changed: [XX.XX.XX.XX]
changed: [YY.YY.YY.YY]
changed: [NN.NN.NN.NN]

RUNNING HANDLER [vip-manager : Wait for the cluster ip address (VIP) "VV.VV.VV.VV" is running] 
fatal: [XX.XX.XX.XX]: FAILED! => {"changed": false, "elapsed": 60, "msg": "Timeout when waiting for VV.VV.VV.VV:22"}
fatal: [YY.YY.YY.YY]: FAILED! => {"changed": false, "elapsed": 60, "msg": "Timeout when waiting for VV.VV.VV.VV:22"}
fatal: [NN.NN.NN.NN]: FAILED! => {"changed": false, "elapsed": 60, "msg": "Timeout when waiting for VV.VV.VV.VV:22"}

I am using a VIP address that has firewall restrictions. On the network equipment, I only allow requests to the VIP address on port 5432 from machines that need access to the database.

The question is, do I need to additionally allow requests to the VIP address, if so, on which ports and to which addresses?

The vip-manager service is running on the servers, but it looks like this:

{RHEL7.9}{N/A}{1th_node}[root@~]$ systemctl status vip-manager.service 
● vip-manager.service - Manages Virtual IP for Patroni
   Loaded: loaded (/etc/systemd/system/vip-manager.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2021-01-20 11:36:27 MSK; 3min 2s ago
  Process: 18386 ExecStopPost=/sbin/ip addr del VV.VV.VV.VV/23 dev ens192 (code=exited, status=2)
  Process: 18381 ExecStart=/usr/bin/vip-manager --config=/etc/patroni/vip-manager.yml (code=exited, status=1/FAILURE)
 Main PID: 18381 (code=exited, status=1/FAILURE)

Jan 20 11:36:26 1th_node systemd[1]: vip-manager.service: control process exited, code=exited status=2
Jan 20 11:36:26 1th_node systemd[1]: Unit vip-manager.service entered failed state.
Jan 20 11:36:26 1th_node systemd[1]: vip-manager.service failed.
Jan 20 11:36:27 1th_node systemd[1]: vip-manager.service holdoff time over, scheduling restart.
Jan 20 11:36:27 1th_node systemd[1]: Stopped Manages Virtual IP for Patroni.
Jan 20 11:36:27 1th_node systemd[1]: start request repeated too quickly for vip-manager.service
Jan 20 11:36:27 1th_node systemd[1]: Failed to start Manages Virtual IP for Patroni.
Jan 20 11:36:27 1th_node systemd[1]: Unit vip-manager.service entered failed state.
Jan 20 11:36:27 1th_node systemd[1]: vip-manager.service failed.

{RHEL7.9}{N/A}{2th_node}[root@~]$ systemctl status vip-manager.service
● vip-manager.service - Manages Virtual IP for Patroni
   Loaded: loaded (/etc/systemd/system/vip-manager.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2021-01-20 11:36:26 MSK; 15min ago
  Process: 15816 ExecStopPost=/sbin/ip addr del VV.VV.VV.VV/23 dev ens192 (code=exited, status=2)
  Process: 15810 ExecStart=/usr/bin/vip-manager --config=/etc/patroni/vip-manager.yml (code=exited, status=1/FAILURE)
 Main PID: 15810 (code=exited, status=1/FAILURE)

Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service: control process exited, code=exited status=2
Jan 20 11:36:26 2th_node systemd[1]: Unit vip-manager.service entered failed state.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service failed.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service holdoff time over, scheduling restart.
Jan 20 11:36:26 2th_node systemd[1]: Stopped Manages Virtual IP for Patroni.
Jan 20 11:36:26 2th_node systemd[1]: start request repeated too quickly for vip-manager.service
Jan 20 11:36:26 2th_node systemd[1]: Failed to start Manages Virtual IP for Patroni.
Jan 20 11:36:26 2th_node systemd[1]: Unit vip-manager.service entered failed state.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service failed.

{RHEL7.9}{N/A}{3th_node}[root@~]$ systemctl status vip-manager.service
● vip-manager.service - Manages Virtual IP for Patroni
   Loaded: loaded (/etc/systemd/system/vip-manager.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2021-01-20 11:36:27 MSK; 15min ago
  Process: 2148 ExecStopPost=/sbin/ip addr del VV.VV.VV.VV/23 dev ens192 (code=exited, status=2)
  Process: 2143 ExecStart=/usr/bin/vip-manager --config=/etc/patroni/vip-manager.yml (code=exited, status=1/FAILURE)
 Main PID: 2143 (code=exited, status=1/FAILURE)

Jan 20 11:36:27 3th_node systemd[1]: vip-manager.service: control process exited, code=exited status=2
Jan 20 11:36:27 3th_node systemd[1]: Unit vip-manager.service entered failed state.
Jan 20 11:36:27 3th_node systemd[1]: vip-manager.service failed.
Jan 20 11:36:27 3th_node systemd[1]: vip-manager.service holdoff time over, scheduling restart.
Jan 20 11:36:27 3th_node systemd[1]: Stopped Manages Virtual IP for Patroni.
Jan 20 11:36:27 3th_node systemd[1]: start request repeated too quickly for vip-manager.service
Jan 20 11:36:27 3th_node systemd[1]: Failed to start Manages Virtual IP for Patroni.
Jan 20 11:36:27 3th_node systemd[1]: Unit vip-manager.service entered failed state.
Jan 20 11:36:27 3th_node systemd[1]: vip-manager.service failed.
vitabaks commented 3 years ago

Solution: since I was using postgres version 12, I did not have the postgresql12-devel package installed due to the missing llvm-toolset-7-clang dependency (this package is missing from our repository). I chose to use Postgres version 10 and it worked.

The llvm-toolset-7-clang package can be found in the Software Collections (SCL) repository. See this commit: https://github.com/vitabaks/postgresql_cluster/commit/cc24028962b30ba7cc4bd59c6defdb17af2545a5 If for some reason you cannot upload a package to your repository, you can download it and specify package file (as well as all dependent packages) in the packages_from_file variable.

The question is, do I need to additionally allow requests to the VIP address, if so, on which ports and to which addresses?

vip-manager must have access to DCS, if you use etcd, then this is port 2379. To access the VIP address from the application side, you need to open "pgbouncer_listen_port" or if you do not use pgbouncer, then access via the "postgresql_port" is required.

Jan 20 11:36:27 3th_node systemd[1]: vip-manager.service failed.

It's not entirely clear yet. Are there any other errors in the vip-manager log? sudo journalctl -u vip-manager

pavelkogen commented 3 years ago

The llvm-toolset-7-clang package can be found in the Software Collections (SCL) repository. See this commit: cc24028 If for some reason you cannot upload a package to your repository, you can download it and specify package file (as well as all dependent packages) in the packages_from_file variable.

Ok, thanks! I will try.

vip-manager must have access to DCS, if you use etcd, then this is port 2379.

I use etcd by default. All cluster nodes use port 2379 on a regular address (not virtual).

To access the VIP address from the application side, you need to open "pgbouncer_listen_port" or if you do not use pgbouncer, then access via the "postgresql_port" is required.

Yes exactly. I missed a moment with pgbouncer, port 6432 is not set in firewall for regular and virtual address.

Are there any other errors in the vip-manager log?

{RHEL7.9}{N/A}{2th_node}[root@~]$ journalctl -u vip-manager
-- Logs begin at Tue 2021-01-19 21:18:20 MSK, end at Wed 2021-01-20 14:04:34 MSK. --
Jan 20 11:36:25 2th_node systemd[1]: Started Manages Virtual IP for Patroni.
Jan 20 11:36:25 2th_node vip-manager[15626]: 2021/01/20 11:36:25 reading config from /etc/patroni/vip-manager.yml
Jan 20 11:36:25 2th_node vip-manager[15626]: 2021/01/20 11:36:25 Setting network interface is mandatory
Jan 20 11:36:25 2th_node systemd[1]: vip-manager.service: main process exited, code=exited, status=1/FAILURE
Jan 20 11:36:25 2th_node ip[15631]: RTNETLINK answers: Cannot assign requested address
Jan 20 11:36:25 2th_node systemd[1]: vip-manager.service: control process exited, code=exited status=2
Jan 20 11:36:25 2th_node systemd[1]: Unit vip-manager.service entered failed state.
Jan 20 11:36:25 2th_node systemd[1]: vip-manager.service failed.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service holdoff time over, scheduling restart.
Jan 20 11:36:26 2th_node systemd[1]: Stopped Manages Virtual IP for Patroni.
Jan 20 11:36:26 2th_node systemd[1]: Started Manages Virtual IP for Patroni.
Jan 20 11:36:26 2th_node vip-manager[15662]: 2021/01/20 11:36:26 reading config from /etc/patroni/vip-manager.yml
Jan 20 11:36:26 2th_node vip-manager[15662]: 2021/01/20 11:36:26 Setting network interface is mandatory
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service: main process exited, code=exited, status=1/FAILURE
Jan 20 11:36:26 2th_node ip[15667]: RTNETLINK answers: Cannot assign requested address
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service: control process exited, code=exited status=2
Jan 20 11:36:26 2th_node systemd[1]: Unit vip-manager.service entered failed state.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service failed.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service holdoff time over, scheduling restart.
Jan 20 11:36:26 2th_node systemd[1]: Stopped Manages Virtual IP for Patroni.
Jan 20 11:36:26 2th_node systemd[1]: Started Manages Virtual IP for Patroni.
Jan 20 11:36:26 2th_node vip-manager[15726]: 2021/01/20 11:36:26 reading config from /etc/patroni/vip-manager.yml
Jan 20 11:36:26 2th_node vip-manager[15726]: 2021/01/20 11:36:26 Setting network interface is mandatory
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service: main process exited, code=exited, status=1/FAILURE
Jan 20 11:36:26 2th_node ip[15732]: RTNETLINK answers: Cannot assign requested address
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service: control process exited, code=exited status=2
Jan 20 11:36:26 2th_node systemd[1]: Unit vip-manager.service entered failed state.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service failed.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service holdoff time over, scheduling restart.
Jan 20 11:36:26 2th_node systemd[1]: Stopped Manages Virtual IP for Patroni.
Jan 20 11:36:26 2th_node systemd[1]: Started Manages Virtual IP for Patroni.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service: main process exited, code=exited, status=1/FAILURE
Jan 20 11:36:26 2th_node vip-manager[15796]: 2021/01/20 11:36:26 reading config from /etc/patroni/vip-manager.yml
Jan 20 11:36:26 2th_node vip-manager[15796]: 2021/01/20 11:36:26 Setting network interface is mandatory
Jan 20 11:36:26 2th_node ip[15803]: RTNETLINK answers: Cannot assign requested address
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service: control process exited, code=exited status=2
Jan 20 11:36:26 2th_node systemd[1]: Unit vip-manager.service entered failed state.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service failed.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service holdoff time over, scheduling restart.
Jan 20 11:36:26 2th_node systemd[1]: Stopped Manages Virtual IP for Patroni.
Jan 20 11:36:26 2th_node systemd[1]: Started Manages Virtual IP for Patroni.
Jan 20 11:36:26 2th_node vip-manager[15810]: 2021/01/20 11:36:26 reading config from /etc/patroni/vip-manager.yml
Jan 20 11:36:26 2th_node vip-manager[15810]: 2021/01/20 11:36:26 Setting network interface is mandatory
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service: main process exited, code=exited, status=1/FAILURE
Jan 20 11:36:26 2th_node ip[15816]: RTNETLINK answers: Cannot assign requested address
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service: control process exited, code=exited status=2
Jan 20 11:36:26 2th_node systemd[1]: Unit vip-manager.service entered failed state.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service failed.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service holdoff time over, scheduling restart.
Jan 20 11:36:26 2th_node systemd[1]: Stopped Manages Virtual IP for Patroni.
Jan 20 11:36:26 2th_node systemd[1]: start request repeated too quickly for vip-manager.service
Jan 20 11:36:26 2th_node systemd[1]: Failed to start Manages Virtual IP for Patroni.
Jan 20 11:36:26 2th_node systemd[1]: Unit vip-manager.service entered failed state.
Jan 20 11:36:26 2th_node systemd[1]: vip-manager.service failed.
vitabaks commented 3 years ago

@pavelkogen What package version do you have specified in the vip_manager_package_file variable? Must be at least 1.0 version

try vip_manager_package_file: "vip-manager_1.0.1-1_amd64.rpm"

download file here: https://github.com/cybertec-postgresql/vip-manager/releases/download/v1.0.1/vip-manager_1.0.1-1_amd64.rpm

pavelkogen commented 3 years ago

@pavelkogen What package version do you have specified in the vip_manager_package_file variable? Must be at least 1.0 version

try vip_manager_package_file: "vip-manager_1.0.1-1_amd64.rpm"

download file here: https://github.com/cybertec-postgresql/vip-manager/releases/download/v1.0.1/vip-manager_1.0.1-1_amd64.rpm

Yeah, i saw your issue in vip-manager repository and used the most recent version 1.0.1 just in case. In my comments above, I am already using this version.

vitabaks commented 3 years ago

From the log I see that the ens192 interface is specified

If your server has several network interfaces, make sure that you have specified the correct interface name in the vip_interface variable (or vip_manager_iface).

pavelkogen commented 3 years ago

From the log I see that the ens192 interface is specified

If your server has several network interfaces, make sure that you have specified the correct interface name in the vip_interface variable (or vip_manager_iface).

Yes, this interface name was created automatically during server installation. Ansible assigned the vip_manager_iface variable from the ansible_default_ipv4.interface variable. I am using one interface on these servers.

{RHEL7.9}{N/A}{1th_node}[root@~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:bf:a1:72 brd ff:ff:ff:ff:ff:ff
    inet XX.XX.XX.XX/23 brd NETMASK scope global ens192
       valid_lft forever preferred_lft forever

I have a suspicion that the vip-manager cannot configure the network service to add a VIP address. However, I cannot check this, because I use all the default parameters, and I do not see anything unusual.

Perhaps I should manually make changes to the server's network configuration?

My vip-manager config:

$ cat /etc/patroni/vip-manager.yml 
interval: 1000

trigger-key: "/service/postgres-cluster/leader"
trigger-value: "1th_node"

ip: VV.VV.VV.VV # the virtual ip address to manage
netmask: 23 # netmask for the virtual ip
interface: ens192 # interface to which the virtual ip will be added

hosting-type: basic # possible values: basic, or hetzner.

dcs-type: etcd # etcd or consul
dcs-endpoints:
  - http://XX.XX.XX.XX:2379
  - http://YY.YY.YY.YY:2379
  - http://NN.NN.NN.NN:2379

retry-num: 2
retry-after: 250  #in milliseconds

verbose: false

When trying to run manually:

{RHEL7.9}{N/A}{1th_node}[root@~]$ vip-manager --config=/etc/patroni/vip-manager.yml
2021/01/21 16:44:05 reading config from /etc/patroni/vip-manager.yml
2021/01/21 16:44:05 Setting network interface is mandatory
vitabaks commented 3 years ago

Jan 20 11:36:26 2th_node ip[15816]: RTNETLINK answers: Cannot assign requested address

Perhaps the problem is not related to vip-manager. There are some problems when adding a second IP address for the network card.

Try to add the VIP manually, will there be an error?

Example: ip addr add VV.VV.VV.VV/23 dev ens192

VV.VV.VV.VV - your VIP address

pavelkogen commented 3 years ago

Try to add the VIP manually, will there be an error?

There are no problems with adding a VIP address manually. I can add and remove it, everything is successful. I have not been able to determine the reason why vip-manager does not work.

But, I tried plan A. So, the cluster seems to be ready to go. But services like keepalived and confd were not started. I started them manually and found that keepalived does not translate the VIP address to other servers (VIP-address is enable on all servers at once). Perhaps the problem is that I have blocked multicast on NSX (I use VMware to virtualize my servers).

I also wanted to clarify why the keepalived configuration on three nodes is the same, they all have the BACKUP status and the same weight. This is normal?

keepalived.conf.j2:

vrrp_instance VI_1 {
   priority  100
   state  BACKUP
vitabaks commented 3 years ago

I also wanted to clarify why the keepalived configuration on three nodes is the same, they all have the BACKUP status and the same weight. This is normal?

Yes.
In the TypeA scheme, the VIP address is not tied to the role of the master. In our configuration keepalived checks the status of the HAProxy service and in case of a failure delegates the VIP to another balancer server.

If necessary, you can manually increase the weight on one of the servers to move the VIP address.

vitabaks commented 3 years ago

I have not been able to determine the reason why vip-manager does not work.

While this remains a mystery.
Let me know if you can fix this problem. I haven't been able to reproduce it yet, everything works fine.