google / seesaw

Seesaw v2 is a Linux Virtual Server (LVS) based load balancing platform.
Apache License 2.0
5.63k stars 511 forks source link

Health Check notification - Send failed #125

Open naji-abdulla opened 1 year ago

naji-abdulla commented 1 year ago

Consistently see healthcheck notification failed to send in some of the deployments

Client has the following errors in the log

E0301 16:41:25.941121  157041 core.go:616] Send failed 1 times: read unix @->/var/run/seesaw/engine/engine.sock?backlog=8192: i/o timeout
..........
E0301 16:45:41.944832  157041 core.go:616] Send failed 9 times: read unix @->/var/run/seesaw/engine/engine.sock?backlog=8192: i/o timeout
E0301 16:46:13.945305  157041 core.go:616] Send failed 10 times: read unix @->/var/run/seesaw/engine/engine.sock?backlog=8192: i/o timeout

Server has the /var/run/seesaw/engine/engine.sock connections are piling up

ss | grep engine.sock | wc -l
15905

Any pointers to why this error happening will be helpful

hazaelsan commented 1 year ago

Those errors happen when the engine didn't respond quickly enough (10 seconds). The fact the last line shows this failed 10 consecutive times is a bit concerning.

A few questions:

naji-abdulla commented 1 year ago

This turned out to be because when the server call back returns an error to RPC machinery , RPC connection seems to be timing out.