check timeout timer fired

sorintlab / stolon

PostgreSQL cloud native High Availability and more.

https://talk.stolon.io

Apache License 2.0

4.63k stars 444 forks source link

check timeout timer fired #818

Closed mmpei closed 3 years ago

mmpei commented 3 years ago

v0.12.0

log is from stolon-proxy.

i checked the source code, and found that it will assign the destAddress to a channel after the log 'proxying to master address xxxx'. this could nerver suspend the process. so what's causing it to run out of time?

sgotti commented 3 years ago

@mmpei If you want to open a bug report please fill the template provided when you open a new issue, if this is a question please ask it in the stolon forum: https://github.com/sorintlab/stolon#contacts

mmpei commented 3 years ago

OK move to talk.stolon.

mmpei commented 3 years ago

it's caused by resource limit of Kubernetes.(acturally CPU limit in our case). but network unstable, api-server timeout or something like that will bring the stolon-proxy timeout and break the connection of clients, this will make a broken pipe error and client can't reconnect sometimes.

HonakerM commented 3 years ago

@mmpei do you know of any way to debug this issue? I am seeing similar results in my cluster and I've increased the resources to 1cpu core and 750mb ram but have had no success in stopping this issue.