Closed unicell closed 6 years ago
This is not so obvious from healthcheck logs, but each time healthcheck component failed to update state back to the engine through RPC call, it always fails 11 times in a row and eventually bails out.
I1107 23:16:05.285266 2552 core.go:503] Getting healthchecks from engine... I1107 23:16:05.286748 2552 core.go:509] Engine returned 2 healthchecks E1107 23:16:06.294586 2552 core.go:590] Send failed: read unix @->/var/run/seesaw/engine/engine.sock: i/o timeout E1107 23:16:08.294828 2552 core.go:590] Send failed: read unix @->/var/run/seesaw/engine/engine.sock: i/o timeout E1107 23:16:10.295082 2552 core.go:590] Send failed: read unix @->/var/run/seesaw/engine/engine.sock: i/o timeout I1107 23:16:12.243872 2552 core.go:317] ID 0x7000000000001: (TCP 10.5.52.160:80 DSR (via 10.220.22.33 mark 65536)) FAILURE: Timed out I1107 23:16:12.293796 2552 core.go:317] ID 0x7000000000000: (TCP 10.5.52.31:443 DSR (via 10.220.22.33 mark 65536)) FAILURE: Timed out E1107 23:16:12.295279 2552 core.go:590] Send failed: read unix @->/var/run/seesaw/engine/engine.sock: i/o timeout E1107 23:16:14.295485 2552 core.go:590] Send failed: read unix @->/var/run/seesaw/engine/engine.sock: i/o timeout E1107 23:16:16.295747 2552 core.go:590] Send failed: read unix @->/var/run/seesaw/engine/engine.sock: i/o timeout E1107 23:16:18.295980 2552 core.go:590] Send failed: read unix @->/var/run/seesaw/engine/engine.sock: i/o timeout I1107 23:16:20.287014 2552 core.go:503] Getting healthchecks from engine... I1107 23:16:20.288315 2552 core.go:509] Engine returned 2 healthchecks E1107 23:16:20.296185 2552 core.go:590] Send failed: read unix @->/var/run/seesaw/engine/engine.sock: i/o timeout F1107 23:16:20.296236 2552 core.go:580] send: 11 errors, giving up
Closing the issue as the fix merged in https://github.com/google/seesaw/commit/34716af0775ecb1fad239a726390d63d6b0742dd
This is not so obvious from healthcheck logs, but each time healthcheck component failed to update state back to the engine through RPC call, it always fails 11 times in a row and eventually bails out.