Closed bboreham closed 4 years ago
Possibly related: today a single pipe server with 10,000 goroutines.
Top of goroutine dump:
goroutine profile: total 8989
8958 @ 0x433320 0x442e7b 0x4ad297 0x4adb5c 0xb86e87 0x4615c1
# 0x4ad296 io.(*pipe).Read+0xe6 /usr/local/go/src/io/pipe.go:50
# 0x4adb5b io.(*PipeReader).Read+0x4b /usr/local/go/src/io/pipe.go:127
# 0xb86e86 github.com/weaveworks/scope/common/xfer.(*pipe).CopyToWebsocket.func2+0x86 /go/src/github.com/weaveworks/scop\
e/common/xfer/pipes.go:141
5 @ 0x433320 0x442e7b 0xb84198 0x16f255f 0x16f2ab4 0x6ea7c4 0x7a99db 0x988b6c 0x6ea7c4 0x988cfe 0x6ea7c4 0x989189 0x6ea7c4 0x6\
edc14 0x6e95b5 0x4615c1
# 0xb84197 github.com/weaveworks/scope/common/xfer.(*pipe).CopyToWebsocket+0x297 /go/src/github.com/wea\
veworks/scope/common/xfer/pipes.go:160
# 0x16f255e github.com/weaveworks/scope/app.handlePipeWs.func1+0x50e /go/src/github.com/weaveworks/\
scope/app/pipes.go:69
# 0x16f2ab3 github.com/weaveworks/scope/app.requestContextDecorator.func1+0xb3 /go/src/github.com/weavewo\
rks/scope/app/router.go:46
# 0x6ea7c3 net/http.HandlerFunc.ServeHTTP+0x43 /usr/local/go/src/net/http/server.go:2007
# 0x7a99da github.com/weaveworks/scope/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0xda /go/src/github.com\
/weaveworks/scope/vendor/github.com/gorilla/mux/mux.go:107
# 0x988b6b github.com/weaveworks/scope/vendor/github.com/weaveworks/common/middleware.Tracer.Wrap.func1+0x8b /go/src/gi\
thub.com/weaveworks/scope/vendor/github.com/weaveworks/common/middleware/http_tracing.go:25
# 0x6ea7c3 net/http.HandlerFunc.ServeHTTP+0x43 /usr/local/go/src/net/http/server.go:2007
# 0x988cfd github.com/weaveworks/scope/vendor/github.com/weaveworks/common/middleware.Instrument.Wrap.func1+0x16d /go/sr\
c/github.com/weaveworks/scope/vendor/github.com/weaveworks/common/middleware/instrument.go:49
# 0x6ea7c3 net/http.HandlerFunc.ServeHTTP+0x43 /usr/local/go/src/net/http/server.go:2007
# 0x989188 github.com/weaveworks/scope/vendor/github.com/weaveworks/common/middleware.Log.Wrap.func1+0x2a8 /go/src/gi\
thub.com/weaveworks/scope/vendor/github.com/weaveworks/common/middleware/logging.go:36
# 0x6ea7c3 net/http.HandlerFunc.ServeHTTP+0x43 /usr/local/go/src/net/http/server.go:2007
# 0x6edc13 net/http.serverHandler.ServeHTTP+0xa3 /usr/local/go/src/net/http/server.go:2\
802
# 0x6e95b4 net/http.(*conn).serve+0x874 /usr/local/go/src/net/http/server.go:1890
3 @ 0x433320 0x42e0ba 0x42d685 0x4ced65 0x4cfcdf 0x4cfcc1 0x5b334f 0x5c7578 0x565c2a 0xb7e968 0xb7eaba 0xb7f866 0xb7fd8f 0xb85\
cbf 0xb86ccd 0x4615c1
# 0x42d684 internal/poll.runtime_pollWait+0x54 /usr/local/go/src/runtime/netpoll.go:184
# 0x4ced64 internal/poll.(*pollDesc).wait+0x44 /usr/local/go/src/internal/poll/fd_poll_runtim\
e.go:87
# 0x4cfcde internal/poll.(*pollDesc).waitRead+0x1ce /usr/local/go/src/internal/poll/fd_poll_ru\
ntime.go:92
# 0x4cfcc0 internal/poll.(*FD).Read+0x1b0 /usr/local/go/src/internal/poll/fd_unix.go:169
# 0x5b334e net.(*netFD).Read+0x4e /usr/local/go/src/net/fd_unix.go:202
# 0x5c7577 net.(*conn).Read+0x67 /usr/local/go/src/net/net.go:184
# 0x565c29 bufio.(*Reader).Read+0x269 /usr/local/go/src/bufio/bufio.go:226
# 0xb7e967 github.com/weaveworks/scope/vendor/github.com/gorilla/websocket.(*Conn).readFull+0x77 /go/src/github.com\
/weaveworks/scope/vendor/github.com/gorilla/websocket/conn.go:617
# 0xb7eab9 github.com/weaveworks/scope/vendor/github.com/gorilla/websocket.(*Conn).advanceFrame+0x89 /go/src/github.com\
/weaveworks/scope/vendor/github.com/gorilla/websocket/conn.go:641
# 0xb7f865 github.com/weaveworks/scope/vendor/github.com/gorilla/websocket.(*Conn).NextReader+0x55 /go/src/github.com\
/weaveworks/scope/vendor/github.com/gorilla/websocket/conn.go:780
# 0xb7fd8e github.com/weaveworks/scope/vendor/github.com/gorilla/websocket.(*Conn).ReadMessage+0x2e /go/src/github.com\
/weaveworks/scope/vendor/github.com/gorilla/websocket/conn.go:852
# 0xb85cbe github.com/weaveworks/scope/common/xfer.(*pingingWebsocket).ReadMessage+0xae /go/src/github.com/wea\
veworks/scope/common/xfer/websocket.go:109
# 0xb86ccc github.com/weaveworks/scope/common/xfer.(*pipe).CopyToWebsocket.func1+0x3c /go/src/github.com/weavewo\
rks/scope/common/xfer/pipes.go:120
Tail of logs:
<app> INFO: 2019/12/16 08:24:57.876959 Creating pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:24:57.960942 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:24:58.200017 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:24:58.443435 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:24:58.700782 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:24:58.970222 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:24:59.286143 Deleting pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:03.267505 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:04.841362 Creating pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:04.924074 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:06.006844 Deleting pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:08.756865 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:10.479201 Creating pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:10.552417 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:11.241154 Deleting pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:14.396069 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:14.866810 Creating pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:14.949131 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:15.180797 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:15.272817 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:15.344041 Deleting pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:16.470852 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:17.273131 Creating pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:17.355450 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:17.854341 Deleting pipe pipes/14815-pipe-661242507181636894
I tracked this down to the handling of EOF on the non-websocket side of the conversation in the probe.
Can reproduce easily by clicking on the 'describe' button on a Kubernetes object.
This is observed in multi-tenant mode; all replicas are spinning with high CPU consumption in
multitenant.(*consulPipeRouter).watchAll()
. Consul is also showing high CPU.Logs at the start of the high-cpu period:
the creating/deleting cycle continues, with different tenant IDs appearing from time to time. The 'error CASing' message reappear every so often, sometimes up 2-3 minutes between batches.
According to monitoring data we have experienced periods of high CPU in the pipe service for months.