Closed ponimas closed 3 years ago
Hello, thank you for the detailed report.
The backend1 is an unreliable service
Is it a backend1
or a backend2
in fact?
Could you also share timeout configuration? timeout-backend
and response-header-timeout-backend
in particular.
https://github.com/zalando/skipper/blob/44d0a73103d9416fe2e29410efae4815169292a1/proxy/proxy.go#L211-L216
Defaults are https://github.com/zalando/skipper/blob/44d0a73103d9416fe2e29410efae4815169292a1/config/config.go#L417-L437
Is it a
backend1
or abackend2
in fact?
It is backend1 that is unreliable and crashing. We are testing its replacement, backend2 through teeLoopback
shadow traffic.
@AlexanderYastrebov some more insights from the logs. It seemed that the circuit breaker to backend1 was open around the time when goroutines leak happened. I am wondering if we are consuming the POST body properly in this edge case that
1) The request method is POST
2) The circuit breaker to actual backend (backend1) is open
3) There is teeLoopback()
that copies the body
I recall we had similar goroutines leak problems with combination of <shunt>
and tee()
, and it was fixed by consuming the body here
https://github.com/zalando/skipper/blob/44d0a73103d9416fe2e29410efae4815169292a1/proxy/proxy.go#L1086-L1092
I am wondering if something similar is needed to consume the POST body in the case of open circuit breaker here https://github.com/zalando/skipper/blob/44d0a73103d9416fe2e29410efae4815169292a1/proxy/proxy.go#L1111-L1115
Could you also share timeout configuration? timeout-backend and response-header-timeout-backend in particular.
TimeoutBackend
is 53 * time.Second
and ResponseHeaderTimeout
is not set, so should be default.
@oporkka @ponimas I think you've nailed it. teeLoopback
uses pipe to copy request body that is written to when original body is read. If backend error happens or circuit breaker is open then request body is not read.
Demo:
$ bin/skipper -inline-routes='b1: Path("/test") -> teeLoopback("tag") -> "http://doesnotexist.local"; b2: Path("/test") && Tee("tag") -> <shunt>;'
# monitor go routine count
$ watch -n 1 'curl -s localhost:9911/metrics | jq -r .gauges[\"skipper.runtime.NumGoroutine\"].value'
# create requests
$ echo "POST http://localhost:9090/test" | vegeta attack -rate 100/s -body <(echo X) | vegeta report -every 1s
shows go routine count grows.
The circuit breaker simply prevents backend request when open and also does not read request body, the problem exists without it as well. https://github.com/zalando/skipper/blob/44d0a73103d9416fe2e29410efae4815169292a1/proxy/proxy.go#L1109-L1123
Also IMO circuit breaker should shut the filter chain when open like we did for ratelimit filters in https://github.com/zalando/skipper/pull/1549 - this should prevent calling potentially expensive filters like tokeninfo down the chain but that is another story.
Another problem is that request body cloning is synchronous so if the backend on T-route is slow (or does not read the request body) it affects the main route since read on main route is blocked by the write to the T-pipe which waits for the read on T-route. https://github.com/zalando/skipper/blob/44d0a73103d9416fe2e29410efae4815169292a1/proxy/teebody.go#L16-L27
# simulate slow T route
$ bin/skipper -inline-routes='b1: Path("/test") -> teeLoopback("tag") -> status(204) -> <shunt>; b2: Path("/test") && Tee("tag") -> backendLatency("10m") -> <shunt>;'
# hangs
$ curl -v localhost:9090/test -d"x"
Thanks for reporting this, great findings! We should consume the request body in these cases like in case of tee/shunt. Or maybe find better ways to terminate the request, as there can be requests from attackers whose request body never finishes, but that can be handled as a separate issue, together with all such cases.
Another problem is that request body cloning is synchronous so if the backend on T-route is slow (or does not read the request body) it affects the main route since read on main route is blocked by the write to the T-pipe which waits for the read on T-route.
This is done so intentionally, because otherwise we wouldn't be able to control how much memory is used at a given time, because we can only read once from the incoming connection, and if the streaming is out of sync, we potentially would need to buffer up unbound amount of the data.
Describe the bug We have set up traffic shadowing with these 2 routes (simplified):
The
backend1
is an unreliable service and it's being restarted periodically. At the exact moments of the restart, we observe goroutine leaks.We were able to get a goroutine stack dump from
/debug/pprof/goroutine
- goroutine.zipExpected behavior No goroutine leak.