Goroutine leak in the `teeLoopback`

ponimas commented 3 years ago

Describe the bug We have set up traffic shadowing with these 2 routes (simplified):

v1: Path("/some-path")
&& Traffic(0.5)
-> consecutiveBreaker(120)
-> tracingSpanName("span")
-> teeLoopback("tag")
-> "https://backend1";

v2: Tee("tag")
&& True()
-> "https://backend2";

The backend1 is an unreliable service and it's being restarted periodically. At the exact moments of the restart, we observe goroutine leaks. Screenshot_20210714_173429

We were able to get a goroutine stack dump from /debug/pprof/goroutine - goroutine.zip

(pprof) top 22
Showing nodes accounting for 4300, 100% of 4302 total
Dropped 83 nodes (cum <= 21)
      flat  flat%   sum%        cum   cum%
      4300   100%   100%       4300   100%  runtime.gopark
         0     0%   100%       2130 49.51%  bufio.(*Writer).ReadFrom
         0     0%   100%       2130 49.51%  github.com/zalando/skipper/proxy.(*Proxy).do
         0     0%   100%       2130 49.51%  github.com/zalando/skipper/proxy.(*Proxy).makeBackendRequest
         0     0%   100%       2130 49.51%  github.com/zalando/skipper/proxy.(*context).Loopback
         0     0%   100%       2130 49.51%  io.(*LimitedReader).Read
         0     0%   100%       2130 49.51%  io.(*PipeReader).Read
         0     0%   100%       2130 49.51%  io.(*pipe).Read
         0     0%   100%       2130 49.51%  io.Copy (inline)
         0     0%   100%       2130 49.51%  io.copyBuffer
         0     0%   100%       2130 49.51%  net/http.(*Request).write
         0     0%   100%       2130 49.51%  net/http.(*Transport).RoundTrip
         0     0%   100%       2130 49.51%  net/http.(*Transport).roundTrip
         0     0%   100%       2130 49.51%  net/http.(*persistConn).mapRoundTripError
         0     0%   100%       2130 49.51%  net/http.(*persistConn).roundTrip
         0     0%   100%       2131 49.54%  net/http.(*persistConn).writeLoop
         0     0%   100%       2130 49.51%  net/http.(*transferWriter).doBodyCopy
         0     0%   100%       2130 49.51%  net/http.(*transferWriter).writeBody
         0     0%   100%       2130 49.51%  net/http.persistConnWriter.ReadFrom
         0     0%   100%       2136 49.65%  runtime.chanrecv
         0     0%   100%       2133 49.58%  runtime.chanrecv1
         0     0%   100%       2155 50.09%  runtime.selectgo
(pprof)

profile001

Expected behavior No goroutine leak.

AlexanderYastrebov commented 3 years ago

Hello, thank you for the detailed report.

The backend1 is an unreliable service

Is it a backend1 or a backend2 in fact?

Could you also share timeout configuration? timeout-backend and response-header-timeout-backend in particular. https://github.com/zalando/skipper/blob/44d0a73103d9416fe2e29410efae4815169292a1/proxy/proxy.go#L211-L216

Defaults are https://github.com/zalando/skipper/blob/44d0a73103d9416fe2e29410efae4815169292a1/config/config.go#L417-L437

oporkka commented 3 years ago

Is it a backend1 or a backend2 in fact?

It is backend1 that is unreliable and crashing. We are testing its replacement, backend2 through teeLoopback shadow traffic.

@AlexanderYastrebov some more insights from the logs. It seemed that the circuit breaker to backend1 was open around the time when goroutines leak happened. I am wondering if we are consuming the POST body properly in this edge case that 1) The request method is POST 2) The circuit breaker to actual backend (backend1) is open 3) There is teeLoopback() that copies the body

I recall we had similar goroutines leak problems with combination of <shunt> and tee(), and it was fixed by consuming the body here https://github.com/zalando/skipper/blob/44d0a73103d9416fe2e29410efae4815169292a1/proxy/proxy.go#L1086-L1092

I am wondering if something similar is needed to consume the POST body in the case of open circuit breaker here https://github.com/zalando/skipper/blob/44d0a73103d9416fe2e29410efae4815169292a1/proxy/proxy.go#L1111-L1115

ponimas commented 3 years ago

Could you also share timeout configuration? timeout-backend and response-header-timeout-backend in particular.

TimeoutBackend is 53 * time.Second and ResponseHeaderTimeout is not set, so should be default.

AlexanderYastrebov commented 3 years ago

@oporkka @ponimas I think you've nailed it. teeLoopback uses pipe to copy request body that is written to when original body is read. If backend error happens or circuit breaker is open then request body is not read. Demo:

$ bin/skipper -inline-routes='b1: Path("/test") -> teeLoopback("tag") -> "http://doesnotexist.local"; b2: Path("/test") && Tee("tag") -> <shunt>;'

# monitor go routine count
$ watch -n 1 'curl -s localhost:9911/metrics | jq -r .gauges[\"skipper.runtime.NumGoroutine\"].value'

# create requests
$ echo "POST http://localhost:9090/test" | vegeta attack -rate 100/s -body <(echo X) | vegeta report -every 1s

shows go routine count grows.

The circuit breaker simply prevents backend request when open and also does not read request body, the problem exists without it as well. https://github.com/zalando/skipper/blob/44d0a73103d9416fe2e29410efae4815169292a1/proxy/proxy.go#L1109-L1123

Also IMO circuit breaker should shut the filter chain when open like we did for ratelimit filters in https://github.com/zalando/skipper/pull/1549 - this should prevent calling potentially expensive filters like tokeninfo down the chain but that is another story.

AlexanderYastrebov commented 3 years ago

Another problem is that request body cloning is synchronous so if the backend on T-route is slow (or does not read the request body) it affects the main route since read on main route is blocked by the write to the T-pipe which waits for the read on T-route. https://github.com/zalando/skipper/blob/44d0a73103d9416fe2e29410efae4815169292a1/proxy/teebody.go#L16-L27

# simulate slow T route
$ bin/skipper -inline-routes='b1: Path("/test") -> teeLoopback("tag") -> status(204) -> <shunt>; b2: Path("/test") && Tee("tag") -> backendLatency("10m") -> <shunt>;'

# hangs
$ curl -v localhost:9090/test -d"x"

aryszka commented 3 years ago

Thanks for reporting this, great findings! We should consume the request body in these cases like in case of tee/shunt. Or maybe find better ways to terminate the request, as there can be requests from attackers whose request body never finishes, but that can be handled as a separate issue, together with all such cases.

Another problem is that request body cloning is synchronous so if the backend on T-route is slow (or does not read the request body) it affects the main route since read on main route is blocked by the write to the T-pipe which waits for the read on T-route.

This is done so intentionally, because otherwise we wouldn't be able to control how much memory is used at a given time, because we can only read once from the incoming connection, and if the streaming is out of sync, we potentially would need to buffer up unbound amount of the data.

zalando / skipper

Goroutine leak in the `teeLoopback` #1816