Open MohamedMSaeed opened 3 years ago
I also have the same issue, but not sure why this happens, also I noticed that if it works well and I force delete the pod it keeps failing with this message:
2020-12-18 10:07:03,463 ERROR: ObjectCache.run MaxRetryError("HTTPSConnectionPool(host='10.96.0.1', port=443): Max
retries exceede d with url: /api/v1/namespaces/postgresql/pods?labelSelector=application%3Dspilo%2Ccluster-name%3Dkast-
default-postgresql (Caused by NewConnectionError(': Failed to establish a new connection : [Errno 113] No route to host',))",)
2020-12-18 10:07:04,422 ERROR: get_cluster
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 528, in _load_cluster
self._wait_caches()
File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 522, in _wait_caches
raise RetryFailedError('Exceeded retry deadline')
patroni.utils.RetryFailedError: 'Exceeded retry deadline'
2020-12-18 10:07:04,422 ERROR: get_cluster
Please, answer some short questions which should help us to understand your problem / question better?
ISSUE
Almost every fortnight, one of the PG cluster pods fails for no reason! the database is down and I got several error messages. I don't really know what the reason is, and it is started to be very annoying.
I will post the logs sorted by date. Also, I will copy only one line if the same log was written many times.
Logs from coredb-0. it was the master.
Then I keep getting the following log over and over
How do I currently fix this issue: