vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.
MIT License
1.29k stars 352 forks source link

Error etcd and Patroni (ReadTimeoutError("HTTPConnectionPool(host='xxx', port=2379): Read timed out.) #135

Closed DenisK32 closed 2 years ago

DenisK32 commented 2 years ago

hi,getting persistent error:

urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='192.168.1.3', port=2379): Read timed out. (read timeout=3.3321133383336323)
Dec 09 09:55:56 pgsql1 patroni[15197]: During handling of the above exception, another exception occurred:
Dec 09 09:55:56 pgsql1 patroni[15197]: Traceback (most recent call last):
Dec 09 09:55:56 pgsql1 patroni[15197]: File "/usr/local/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 211, in _do_http_request
Dec 09 09:55:56 pgsql1 patroni[15197]: response = request_executor(method, base_uri + path, **kwargs)
Dec 09 09:55:56 pgsql1 patroni[15197]: File "/usr/local/lib/python3.6/site-packages/urllib3/request.py", line 79, in request
Dec 09 09:55:56 pgsql1 patroni[15197]: method, url, fields=fields, headers=headers, **urlopen_kw
Dec 09 09:55:56 pgsql1 patroni[15197]: File "/usr/local/lib/python3.6/site-packages/urllib3/request.py", line 170, in request_encode_body
Dec 09 09:55:56 pgsql1 patroni[15197]: return self.urlopen(method, url, **extra_kw)
Dec 09 09:55:56 pgsql1 patroni[15197]: File "/usr/local/lib/python3.6/site-packages/urllib3/poolmanager.py", line 375, in urlopen
Dec 09 09:55:56 pgsql1 patroni[15197]: response = conn.urlopen(method, u.request_uri, **kw)
Dec 09 09:55:56 pgsql1 patroni[15197]: File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
Dec 09 09:55:56 pgsql1 patroni[15197]: method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
Dec 09 09:55:56 pgsql1 patroni[15197]: File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 574, in increment
Dec 09 09:55:56 pgsql1 patroni[15197]: raise MaxRetryError(_pool, url, error or ResponseError(cause))
Dec 09 09:55:56 pgsql1 patroni[15197]: urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='192.168.1.3', port=2379): Max retries exceeded with url: /v2/keys/service/postgres-cluster/status (Caused by Read
Dec 09 09:55:56 pgsql1 patroni[15197]: During handling of the above exception, another exception occurred:
Dec 09 09:55:56 pgsql1 patroni[15197]: Traceback (most recent call last):
Dec 09 09:55:56 pgsql1 patroni[15197]: File "/usr/local/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 564, in wrapper
Dec 09 09:55:56 pgsql1 patroni[15197]: retval = func(self, *args, **kwargs) is not None
Dec 09 09:55:56 pgsql1 patroni[15197]: File "/usr/local/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 690, in _write_status
Dec 09 09:55:56 pgsql1 patroni[15197]: return self._client.set(self.status_path, value)
Dec 09 09:55:56 pgsql1 patroni[15197]: File "/usr/local/lib/python3.6/site-packages/etcd/client.py", line 721, in set
Dec 09 09:55:56 pgsql1 patroni[15197]: return self.write(key, value, ttl=ttl)
Dec 09 09:55:56 pgsql1 patroni[15197]: File "/usr/local/lib/python3.6/site-packages/etcd/client.py", line 500, in write
Dec 09 09:55:56 pgsql1 patroni[15197]: response = self.api_execute(path, method, params=params)
Dec 09 09:55:56 pgsql1 patroni[15197]: File "/usr/local/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 256, in api_execute
Dec 09 09:55:56 pgsql1 patroni[15197]: response = self._do_http_request(retry, machines_cache, request_executor, method, path, **kwargs)
Dec 09 09:55:56 pgsql1 patroni[15197]: File "/usr/local/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 230, in _do_http_request
Dec 09 09:55:56 pgsql1 patroni[15197]: raise etcd.EtcdException('{0} {1} request failed'.format(method, path))
Dec 09 09:55:56 pgsql1 patroni[15197]: etcd.EtcdException: PUT /v2/keys/service/postgres-cluster/status request failed
Dec 09 09:55:57 pgsql1 patroni[15197]: 2021-12-09 09:55:57,258 INFO: no action. I am (pgsql1), the leader with the lock
Dec 09 09:55:57 pgsql1 patroni[15197]: 2021-12-09 09:55:57,260 WARNING: Loop time exceeded, rescheduling immediately.
Dec 09 09:56:01 pgsql1 patroni[15197]: 2021-12-09 09:56:01,249 INFO: no action. I am (pgsql1), the leader with the lock
Dec 09 09:56:10 pgsql1 patroni[15197]: 2021-12-09 09:56:07,264 INFO: Lock owner: pgsql1; I am pgsql1
Request to server http://192.168.1.2:2379 failed: MaxRetryError('HTTPConnectionPool(host=\'192.168.1.2\', port=2379): Max retries exceeded wi
Dec 09 09:56:10 pgsql1 patroni[15197]: 2021-12-09 09:56:10,604 INFO: Reconnection allowed, looking for another server.
Dec 09 09:56:10 pgsql1 patroni[15197]: 2021-12-09 09:56:10,604 INFO: Retrying on http://192.168.1.4:2379
Dec 09 09:56:11 pgsql1 patroni[15197]: 2021-12-09 09:56:11,377 INFO: Selected new etcd server http://192.168.1.4:2379
Dec 09 09:56:14 pgsql1 patroni[15197]: 2021-12-09 09:56:14,791 INFO: no action. I am (pgsql1), the leader with the lock
Dec 09 09:56:20 pgsql1 patroni[15197]: 2021-12-09 09:56:20,951 INFO: no action. I am (pgsql1), the leader with the lock
Dec 09 09:56:30 pgsql1 patroni[15197]: 2021-12-09 09:56:27,265 INFO: Lock owner: pgsql1; I am pgsql1
lines 544-586/1001 59%
Dec 09 09:55:56 pgsql1 patroni[15197]: self, url, "Read timed out. (read timeout=%s)" % timeout_value

I tried to move etcd to ssd disks, the problem remained. Patroni periodically restarts postgresql. And in the etcd logs it writes: Etcd: Node healthcheck failed.

DenisK32 commented 2 years ago

CentOS7 Postgresql 13

DenisK32 commented 2 years ago
Dec 09 11:22:46 pgsql1 patroni[22108]: 2021-12-09 11:22:42,881 INFO: Lock owner: pgsql1; I am pgsql1
Dec 09 11:22:46 pgsql1 patroni[22108]: 2021-12-09 11:22:46,222 ERROR: Request to server http://192.168.1.2:2379 failed: MaxRetryError('HTTPConnectionPool(host=\'192.168.1.2\', port=2379): Max retries exceeded with url: /v2/keys/service/postgres-cluster/leader (Caused by ReadTimeoutError("HTTPConnectionPool(host=\'192.168.1.2\', port=2379): Read timed out. (read timeout=3.3329069173390358)",))',)
Dec 09 11:22:46 pgsql1 patroni[22108]: 2021-12-09 11:22:46,222 INFO: Reconnection allowed, looking for another server.
Dec 09 11:22:46 pgsql1 patroni[22108]: 2021-12-09 11:22:46,222 INFO: Retrying on http://192.168.1.4:2379
Dec 09 11:22:49 pgsql1 patroni[22108]: 2021-12-09 11:22:49,562 ERROR: Request to server http://192.168.1.4:2379 failed: MaxRetryError('HTTPConnectionPool(host=\'192.168.1.4\', port=2379): Max retries exceeded with url: /v2/keys/service/postgres-cluster/leader (Caused by ReadTimeoutError("HTTPConnectionPool(host=\'192.168.1.4\', port=2379): Read timed out. (read timeout=3.3317284073321694)",))',)
Dec 09 11:22:49 pgsql1 patroni[22108]: 2021-12-09 11:22:49,562 INFO: Reconnection allowed, looking for another server.
vitabaks commented 2 years ago

Hi @DenisK32

Please attach the etcd logs from all the etcd cluster servers.

sudo journalctl -u etcd -n 50 --output=short-precise

DenisK32 commented 2 years ago

pgsql1:

-- Logs begin at Wed 2021-12-08 10:26:04 MSK, end at Thu 2021-12-09 15:34:31 MSK. --
Dec 09 15:33:47.538655 pgsql1 etcd[10788]: member e8662d903067ee5 has a higher version 3.5.1
Dec 09 15:33:47.538688 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:33:47.538711 pgsql1 etcd[10788]: member af39202c7b034c4b has a higher version 3.5.1
Dec 09 15:33:51.448568 pgsql1 etcd[10788]: sync duration of 1.019741391s, expected less than 1s
Dec 09 15:33:51.541659 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:33:51.541706 pgsql1 etcd[10788]: member e8662d903067ee5 has a higher version 3.5.1
Dec 09 15:33:51.541743 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:33:51.541767 pgsql1 etcd[10788]: member af39202c7b034c4b has a higher version 3.5.1
Dec 09 15:33:55.544169 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:33:55.544214 pgsql1 etcd[10788]: member af39202c7b034c4b has a higher version 3.5.1
Dec 09 15:33:55.544265 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:33:55.544303 pgsql1 etcd[10788]: member e8662d903067ee5 has a higher version 3.5.1
Dec 09 15:33:59.547306 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:33:59.547345 pgsql1 etcd[10788]: member e8662d903067ee5 has a higher version 3.5.1
Dec 09 15:33:59.547380 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:33:59.547403 pgsql1 etcd[10788]: member af39202c7b034c4b has a higher version 3.5.1
Dec 09 15:34:01.459442 pgsql1 etcd[10788]: sync duration of 1.009447709s, expected less than 1s
Dec 09 15:34:03.550302 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:03.550351 pgsql1 etcd[10788]: member e8662d903067ee5 has a higher version 3.5.1
Dec 09 15:34:03.550387 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:03.550448 pgsql1 etcd[10788]: member af39202c7b034c4b has a higher version 3.5.1
Dec 09 15:34:07.553159 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:07.553218 pgsql1 etcd[10788]: member e8662d903067ee5 has a higher version 3.5.1
Dec 09 15:34:07.553252 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:07.553300 pgsql1 etcd[10788]: member af39202c7b034c4b has a higher version 3.5.1
Dec 09 15:34:11.555964 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:11.556009 pgsql1 etcd[10788]: member af39202c7b034c4b has a higher version 3.5.1
Dec 09 15:34:11.556038 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:11.556061 pgsql1 etcd[10788]: member e8662d903067ee5 has a higher version 3.5.1
Dec 09 15:34:15.558654 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:15.558739 pgsql1 etcd[10788]: member e8662d903067ee5 has a higher version 3.5.1
Dec 09 15:34:15.558781 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:15.558804 pgsql1 etcd[10788]: member af39202c7b034c4b has a higher version 3.5.1
Dec 09 15:34:19.561277 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:19.561327 pgsql1 etcd[10788]: member e8662d903067ee5 has a higher version 3.5.1
Dec 09 15:34:19.561380 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:19.561438 pgsql1 etcd[10788]: member af39202c7b034c4b has a higher version 3.5.1
Dec 09 15:34:23.460220 pgsql1 etcd[10788]: sync duration of 1.032357394s, expected less than 1s
Dec 09 15:34:23.564244 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:23.564293 pgsql1 etcd[10788]: member e8662d903067ee5 has a higher version 3.5.1
Dec 09 15:34:23.564326 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:23.564349 pgsql1 etcd[10788]: member af39202c7b034c4b has a higher version 3.5.1
Dec 09 15:34:27.566991 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:27.567044 pgsql1 etcd[10788]: member e8662d903067ee5 has a higher version 3.5.1
Dec 09 15:34:27.567083 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:27.567106 pgsql1 etcd[10788]: member af39202c7b034c4b has a higher version 3.5.1
Dec 09 15:34:31.569703 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:31.569754 pgsql1 etcd[10788]: member e8662d903067ee5 has a higher version 3.5.1
Dec 09 15:34:31.569787 pgsql1 etcd[10788]: the local etcd version 3.3.25 is not up-to-date
Dec 09 15:34:31.569810 pgsql1 etcd[10788]: member af39202c7b034c4b has a higher version 3.5.1

pgsql2:

-- Logs begin at Wed 2021-12-08 10:26:39 MSK, end at Thu 2021-12-09 15:35:33 MSK. --
Dec 09 15:29:07.426692 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:29:07.425+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004838619s","expected-duration":"1s"}
Dec 09 15:29:09.437348 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:29:09.436+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.015552542s","expected-duration":"1s"}
Dec 09 15:29:13.439185 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:29:13.438+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.007347781s","expected-duration":"1s"}
Dec 09 15:29:19.439364 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:29:19.438+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.017880082s","expected-duration":"1s"}
Dec 09 15:29:23.425986 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:29:23.424+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004407372s","expected-duration":"1s"}
Dec 09 15:29:27.425864 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:29:27.424+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.00360158s","expected-duration":"1s"}
Dec 09 15:29:29.425736 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:29:29.424+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.003373494s","expected-duration":"1s"}
Dec 09 15:29:37.425332 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:29:37.424+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.00579768s","expected-duration":"1s"}
Dec 09 15:29:39.425220 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:29:39.424+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.005147205s","expected-duration":"1s"}
Dec 09 15:29:41.427535 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:29:41.426+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.007331116s","expected-duration":"1s"}
Dec 09 15:29:45.436132 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:29:45.435+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.005909804s","expected-duration":"1s"}
Dec 09 15:30:17.435029 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:30:17.434+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.00802207s","expected-duration":"1s"}
Dec 09 15:30:23.463419 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:30:23.462+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.036244229s","expected-duration":"1s"}
Dec 09 15:30:31.446060 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:30:31.444+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.019959861s","expected-duration":"1s"}
Dec 09 15:31:01.444981 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:31:01.443+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004183726s","expected-duration":"1s"}
Dec 09 15:31:13.611315 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:31:13.610+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.123185232s","expected-duration":"1s"}
Dec 09 15:31:31.444080 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:31:31.443+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.009715822s","expected-duration":"1s"}
Dec 09 15:32:01.457000 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:01.455+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.023094635s","expected-duration":"1s"}
Dec 09 15:32:05.466060 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:05.464+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.030377296s","expected-duration":"1s"}
Dec 09 15:32:09.434385 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:09.433+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.007599219s","expected-duration":"1s"}
Dec 09 15:32:17.431172 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:17.430+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004646688s","expected-duration":"1s"}
Dec 09 15:32:19.458051 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:19.457+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.031947206s","expected-duration":"1s"}
Dec 09 15:32:23.442916 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:23.441+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.007516973s","expected-duration":"1s"}
Dec 09 15:32:29.430942 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:29.429+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.00492888s","expected-duration":"1s"}
Dec 09 15:32:31.475132 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:31.474+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.049468266s","expected-duration":"1s"}
Dec 09 15:32:35.430657 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:35.429+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.005587844s","expected-duration":"1s"}
Dec 09 15:32:37.430674 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:37.429+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.005313862s","expected-duration":"1s"}
Dec 09 15:32:39.444953 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:39.443+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.019095339s","expected-duration":"1s"}
Dec 09 15:32:41.444974 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:41.443+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.019805921s","expected-duration":"1s"}
Dec 09 15:32:45.474710 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:45.473+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.048197262s","expected-duration":"1s"}
Dec 09 15:32:57.457748 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:57.456+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.0327852s","expected-duration":"1s"}
Dec 09 15:32:59.429789 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:32:59.428+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004704697s","expected-duration":"1s"}
Dec 09 15:33:01.429420 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:33:01.428+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004303074s","expected-duration":"1s"}
Dec 09 15:33:03.464604 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:33:03.463+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.036916449s","expected-duration":"1s"}
Dec 09 15:33:09.432778 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:33:09.431+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.008418037s","expected-duration":"1s"}
Dec 09 15:33:23.466069 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:33:23.464+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.032642481s","expected-duration":"1s"}
Dec 09 15:33:27.451936 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:33:27.450+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.014969036s","expected-duration":"1s"}
Dec 09 15:33:35.475265 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:33:35.473+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.040779084s","expected-duration":"1s"}
Dec 09 15:33:45.462029 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:33:45.460+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.007290508s","expected-duration":"1s"}
Dec 09 15:33:51.439591 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:33:51.438+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.007387593s","expected-duration":"1s"}
Dec 09 15:34:11.459558 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:34:11.458+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.028196003s","expected-duration":"1s"}
Dec 09 15:34:23.456606 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:34:23.455+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.025496829s","expected-duration":"1s"}
Dec 09 15:34:31.459444 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:34:31.458+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.017993117s","expected-duration":"1s"}
Dec 09 15:34:53.438275 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:34:53.437+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.006081858s","expected-duration":"1s"}
Dec 09 15:34:57.458557 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:34:57.457+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.027212599s","expected-duration":"1s"}
Dec 09 15:35:07.541368 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:35:07.540+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.103429147s","expected-duration":"1s"}
Dec 09 15:35:15.499955 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:35:15.498+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.071593276s","expected-duration":"1s"}
Dec 09 15:35:27.436127 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:35:27.435+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.008071973s","expected-duration":"1s"}
Dec 09 15:35:29.480269 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:35:29.479+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.052402754s","expected-duration":"1s"}
Dec 09 15:35:33.489052 pgsql2 bash[12849]: {"level":"warn","ts":"2021-12-09T15:35:33.487+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.028160483s","expected-duration":"1s"}

pgsql3:

-- Logs begin at Wed 2021-12-08 10:26:35 MSK, end at Thu 2021-12-09 15:36:11 MSK. --
Dec 09 15:28:07.422505 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:28:07.421+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.001233844s","expected-duration":"1s"}
Dec 09 15:28:17.434954 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:28:17.433+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.013151282s","expected-duration":"1s"}
Dec 09 15:28:35.436974 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:28:35.436+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.015512813s","expected-duration":"1s"}
Dec 09 15:28:51.456418 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:28:51.455+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.035130458s","expected-duration":"1s"}
Dec 09 15:29:01.441194 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:29:01.440+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.008840311s","expected-duration":"1s"}
Dec 09 15:29:09.425630 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:29:09.424+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.005080617s","expected-duration":"1s"}
Dec 09 15:29:19.424188 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:29:19.423+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004146579s","expected-duration":"1s"}
Dec 09 15:29:59.445953 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:29:59.445+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.019302869s","expected-duration":"1s"}
Dec 09 15:30:15.430628 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:30:15.429+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004688585s","expected-duration":"1s"}
Dec 09 15:30:17.430065 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:30:17.429+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004221306s","expected-duration":"1s"}
Dec 09 15:30:23.496298 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:30:23.495+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.070450737s","expected-duration":"1s"}
Dec 09 15:30:27.429705 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:30:27.428+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004223392s","expected-duration":"1s"}
Dec 09 15:30:29.429786 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:30:29.428+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004417133s","expected-duration":"1s"}
Dec 09 15:30:31.450711 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:30:31.449+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.026114426s","expected-duration":"1s"}
Dec 09 15:31:01.451317 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:31:01.450+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.011638144s","expected-duration":"1s"}
Dec 09 15:31:13.491492 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:31:13.490+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004195498s","expected-duration":"1s"}
Dec 09 15:31:31.443551 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:31:31.442+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.010023881s","expected-duration":"1s"}
Dec 09 15:31:39.439356 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:31:39.438+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.005919471s","expected-duration":"1s"}
Dec 09 15:32:01.470327 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:32:01.469+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.037659907s","expected-duration":"1s"}
Dec 09 15:32:05.472394 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:32:05.471+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.038750282s","expected-duration":"1s"}
Dec 09 15:32:13.461030 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:32:13.460+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.006413235s","expected-duration":"1s"}
Dec 09 15:32:19.429637 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:32:19.428+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004617368s","expected-duration":"1s"}
Dec 09 15:32:31.482512 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:32:31.481+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.057865623s","expected-duration":"1s"}
Dec 09 15:32:41.429024 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:32:41.428+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.005463899s","expected-duration":"1s"}
Dec 09 15:32:45.462159 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:32:45.461+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.037168879s","expected-duration":"1s"}
Dec 09 15:33:03.463560 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:33:03.462+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.036888596s","expected-duration":"1s"}
Dec 09 15:33:27.450760 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:33:27.449+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.0150698s","expected-duration":"1s"}
Dec 09 15:33:35.473890 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:33:35.472+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.040858278s","expected-duration":"1s"}
Dec 09 15:33:45.460401 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:33:45.459+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.007245151s","expected-duration":"1s"}
Dec 09 15:33:51.479689 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:33:51.478+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.048771214s","expected-duration":"1s"}
Dec 09 15:34:01.501537 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:34:01.500+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.049516145s","expected-duration":"1s"}
Dec 09 15:34:05.444863 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:34:05.444+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.014437291s","expected-duration":"1s"}
Dec 09 15:34:11.434372 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:34:11.433+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004095491s","expected-duration":"1s"}
Dec 09 15:34:15.434508 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:34:15.433+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.005184912s","expected-duration":"1s"}
Dec 09 15:34:21.434623 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:34:21.433+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004769263s","expected-duration":"1s"}
Dec 09 15:34:23.467669 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:34:23.466+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.037735251s","expected-duration":"1s"}
Dec 09 15:34:39.434021 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:34:39.432+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004521611s","expected-duration":"1s"}
Dec 09 15:34:41.433625 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:34:41.432+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.004243481s","expected-duration":"1s"}
Dec 09 15:34:43.444697 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:34:43.443+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.015375619s","expected-duration":"1s"}
Dec 09 15:34:53.477997 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:34:53.477+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.047194445s","expected-duration":"1s"}
Dec 09 15:34:57.433363 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:34:57.432+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.003325327s","expected-duration":"1s"}
Dec 09 15:35:03.466216 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:35:03.465+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.019169991s","expected-duration":"1s"}
Dec 09 15:35:07.539491 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:35:07.538+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.103187474s","expected-duration":"1s"}
Dec 09 15:35:13.433448 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:35:13.432+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.003165728s","expected-duration":"1s"}
Dec 09 15:35:15.510750 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:35:15.509+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.083481054s","expected-duration":"1s"}
Dec 09 15:35:29.467768 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:35:29.466+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.04105349s","expected-duration":"1s"}
Dec 09 15:35:33.487980 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:35:33.486+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.028250073s","expected-duration":"1s"}
Dec 09 15:35:45.484699 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:35:45.483+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.013852853s","expected-duration":"1s"}
Dec 09 15:35:55.433763 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:35:55.432+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.007708998s","expected-duration":"1s"}
Dec 09 15:36:11.444491 pgsql3 bash[1619]: {"level":"warn","ts":"2021-12-09T15:36:11.443+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.019388831s","expected-duration":"1s"}
DenisK32 commented 2 years ago

etcd Version: 3.5.1 Git SHA: e8732fb5f Go Version: go1.16.3 Go OS/Arch: linux/amd64

vitabaks commented 2 years ago

pgsql1 - the local etcd version 3.3.25 is not up-to-date

DenisK32 commented 2 years ago

root@pgsql1 /usr/local/bin #./etcd --version etcd Version: 3.5.1 Git SHA: e8732fb5f Go Version: go1.16.3 Go OS/Arch: linux/amd64

vitabaks commented 2 years ago

restart the etcd service on this server

DenisK32 commented 2 years ago
Dec 09 15:48:41.420438 pgsql1 systemd[1]: Started Etcd Server.
Dec 09 15:48:41.427701 pgsql1 bash[8949]: {"level":"info","ts":"2021-12-09T15:48:40.414+0300","caller":"embed/serve.go:140","msg":"serving client traffic insecurely; this is strongly discourage
Dec 09 15:48:41.428274 pgsql1 bash[8949]: {"level":"info","ts":"2021-12-09T15:48:40.414+0300","caller":"embed/serve.go:140","msg":"serving client traffic insecurely; this is strongly discourage
Dec 09 15:48:41.456040 pgsql1 bash[8949]: {"level":"warn","ts":"2021-12-09T15:48:41.454+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.019218386s","expected-duration":"1s"}
Dec 09 15:48:57.506305 pgsql1 bash[8949]: {"level":"warn","ts":"2021-12-09T15:48:57.505+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.03607368s","expected-duration":"1s"}
Dec 09 15:49:01.622165 pgsql1 bash[8949]: {"level":"warn","ts":"2021-12-09T15:49:01.621+0300","caller":"wal/wal.go:802","msg":"slow fdatasync","took":"1.157869735s","expected-duration":"1s"}
vitabaks commented 2 years ago

"slow fdatasync","took":"1.013852853s","expected-duration":"1s"

Is the etcd data exactly on the SSD? Please check the path in the ETCD_DATA_DIR parameter in the etcd.conf

DenisK32 commented 2 years ago

As I understand it, I need to move a separate etcd cluster to ssd disks. At the moment, I have everything spinning on three hosts.

DenisK32 commented 2 years ago

This etcd in SSD:

-- Logs begin at Sun 2021-11-07 00:27:53 MSK, end at Thu 2021-12-09 15:56:37 MSK. --
Dec 09 15:37:15.005003 pgsql2 etcd[11591]: sync duration of 1.20701203s, expected less than 1s
Dec 09 15:38:39.440759 pgsql2 etcd[11591]: sync duration of 2.642716866s, expected less than 1s
Dec 09 15:38:40.003622 pgsql2 etcd[11591]: failed to send out heartbeat on time (exceeded the 1s timeout for 1.705645461s, to e348297e74ecac1f)
Dec 09 15:38:40.003659 pgsql2 etcd[11591]: server is likely overloaded
Dec 09 15:38:40.003672 pgsql2 etcd[11591]: failed to send out heartbeat on time (exceeded the 1s timeout for 1.705744345s, to 6ca573205752146f)
Dec 09 15:38:40.003683 pgsql2 etcd[11591]: server is likely overloaded
Dec 09 15:39:42.803971 pgsql2 etcd[11591]: sync duration of 2.631849397s, expected less than 1s
vitabaks commented 2 years ago

server is likely overloaded

Avoid storing etcd data on the same drive along with other processes (such as the database) that are intensively using the resources of the disk subsystem! Store the etcd and postgresql data on different disks (see etcd_data_dir variable).

Overloaded (highload) database clusters may require the installation of the etcd cluster on dedicated servers, separate from the database servers.

DenisK32 commented 2 years ago

Okay, thanks for answering my questions!