openHPI / poseidon

Scalable task execution orchestrator for CodeOcean
MIT License
8 stars 1 forks source link

Sentry Integration causes OOM kill #540

Open MrSerth opened 7 months ago

MrSerth commented 7 months ago

Today, we had another occurrence of Poseidon being OOM killed.

Stack Trace

``` Jan 24 08:27:45 poseidon-terraform poseidon[624]: heap profile: 194: 751991664 [9538813: 1723733127104] @ heap/1048576 Jan 24 08:27:45 poseidon-terraform poseidon[624]: 1: 83886080 [41: 3439329280] @ 0x4a9d4e 0x4a977d 0x4a9959 0x719b1d 0x71aade 0x71c3b6 0x71beef 0x71aade 0x71c70f 0x71aade 0x71c70f 0x717fd3 0x717b99 0x717490 0x78e7d9 0x78ea3c 0x78f889 0x7784d2 0x 77bb0d 0x77baf7 0x78b2fd 0x47b8df 0x78a5a5 0x78a56f 0xe46015 0x6e6a69 0x6e974e 0x6e5634 0x470781 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x4a9d4d#011bytes.growSlice+0x8d#011#011#011#011#011#011/usr/lib/go-1.21/src/bytes/buffer.go:249 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x4a977c#011bytes.(*Buffer).grow+0x13c#011#011#011#011#011/usr/lib/go-1.21/src/bytes/buffer.go:151 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x4a9958#011bytes.(*Buffer).Write+0x58#011#011#011#011#011/usr/lib/go-1.21/src/bytes/buffer.go:179 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x719b1c#011encoding/json.uintEncoder+0x13c#011#011#011#011#011/usr/lib/go-1.21/src/encoding/json/encode.go:525 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x71aadd#011encoding/json.structEncoder.encode+0x21d#011#011#011/usr/lib/go-1.21/src/encoding/json/encode.go:706 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x71c3b5#011encoding/json.arrayEncoder.encode+0xd5#011#011#011#011/usr/lib/go-1.21/src/encoding/json/encode.go:849 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x71beee#011encoding/json.sliceEncoder.encode+0x30e#011#011#011#011/usr/lib/go-1.21/src/encoding/json/encode.go:822 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x71aadd#011encoding/json.structEncoder.encode+0x21d#011#011#011/usr/lib/go-1.21/src/encoding/json/encode.go:706 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x71c70e#011encoding/json.ptrEncoder.encode+0x20e#011#011#011#011/usr/lib/go-1.21/src/encoding/json/encode.go:878 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x71aadd#011encoding/json.structEncoder.encode+0x21d#011#011#011/usr/lib/go-1.21/src/encoding/json/encode.go:706 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x71c70e#011encoding/json.ptrEncoder.encode+0x20e#011#011#011#011/usr/lib/go-1.21/src/encoding/json/encode.go:878 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x717fd2#011encoding/json.(*encodeState).reflectValue+0x72#011#011#011/usr/lib/go-1.21/src/encoding/json/encode.go:323 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x717b98#011encoding/json.(*encodeState).marshal+0xb8#011#011#011/usr/lib/go-1.21/src/encoding/json/encode.go:295 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x71748f#011encoding/json.Marshal+0xcf#011#011#011#011#011/usr/lib/go-1.21/src/encoding/json/encode.go:162 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x78e7d8#011github.com/getsentry/sentry-go.envelopeFromBody+0x518#011#011/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.25.0/transport.go:196 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x78ea3b#011github.com/getsentry/sentry-go.getRequestFromEvent+0xfb#011#011/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.25.0/transport.go:232 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x78f888#011github.com/getsentry/sentry-go.(*HTTPTransport).SendEvent+0xc8#011/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.25.0/transport.go:359 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x7784d1#011github.com/getsentry/sentry-go.(*Client).processEvent+0x2b1#011/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.25.0/client.go:637 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x77bb0c#011github.com/getsentry/sentry-go.(*Client).CaptureEvent+0x6c#011/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.25.0/client.go:447 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x77baf6#011github.com/getsentry/sentry-go.(*Hub).CaptureEvent+0x56#011#011/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.25.0/hub.go:224 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x78b2fc#011github.com/getsentry/sentry-go.(*Span).doFinish+0x15c#011#011/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.25.0/tracing.go:356 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x47b8de#011sync.(*Once).doSlow+0xbe#011#011#011#011#011/usr/lib/go-1.21/src/sync/once.go:74 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x78a5a4#011sync.(*Once).Do+0x44#011#011#011#011#011#011/usr/lib/go-1.21/src/sync/once.go:65 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x78a56e#011github.com/getsentry/sentry-go.(*Span).Finish+0xe#011#011/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.25.0/tracing.go:205 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110xe46014#011main.initServer.(*Handler).Handle.(*Handler).handle.func1+0x314#011/home/poseidon/go/pkg/mod/github.com/getsentry/sentry-go@v0.25.0/http/sentryhttp.go:119 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x6e6a68#011net/http.HandlerFunc.ServeHTTP+0x28#011#011#011#011/usr/lib/go-1.21/src/net/http/server.go:2136 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x6e974d#011net/http.serverHandler.ServeHTTP+0x8d#011#011#011#011/usr/lib/go-1.21/src/net/http/server.go:2938 Jan 24 08:27:45 poseidon-terraform poseidon[624]: ##0110x6e5633#011net/http.(*conn).serve+0x5f3#011#011#011#011#011/usr/lib/go-1.21/src/net/http/server.go:2009 ```


At the same time, Influx also showed erroneous behaviour:

Stack Trace

``` Jan 24 08:27:25 poseidon-terraform poseidon[624]: 2024/01/24 08:27:25 influxdb2client E! Write error: internal error: unexpected error writing points to database: timeout, batch kept for retrying Jan 24 08:27:25 poseidon-terraform poseidon[624]: time="2024-01-24T08:27:25.603586Z" level=debug code=200 duration=15.765001853s method=GET path=/api/v1/runners/29-9a31d753-ba91-11ee-8c54-fa163e351aea/files user_agent="Faraday v2.9.0" Jan 24 08:27:25 poseidon-terraform poseidon[624]: time="2024-01-24T08:27:25.604447Z" level=debug code=200 duration=16.191098103s method=GET path=/api/v1/runners/29-9a31d753-ba91-11ee-8c54-fa163e351aea/websocket user_agent= Jan 24 08:27:37 poseidon-terraform telegraf[632]: 2024-01-24T08:27:37Z E! [outputs.influxdb_v2] When writing to [http://monitoring-terraform.compute.internal:8086]: Post "http://monitoring-terraform.compute.internal:8086/api/v2/write?bucket=tele graf&org=codeocean": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Jan 24 08:27:37 poseidon-terraform telegraf[632]: 2024-01-24T08:27:37Z E! [agent] Error writing to outputs.influxdb_v2: failed to send metrics to any configured server(s) Jan 24 08:27:38 poseidon-terraform poseidon[624]: 2024/01/24 08:27:38 influxdb2client E! Write error: internal error: unexpected error writing points to database: timeout, batch kept for retrying ```

MrSerth commented 7 months ago

We decided to focus on the Sentry part first (postponing InfluxDB). Maybe, the OOM kill is related to the buffer of Sentry events awaiting transmission to the relay. Thus, it could be well connected to the high memory threshold experience during network outages.

mpass99 commented 5 months ago

WIP Questions to investigate:

We agreed to postpone this issue until the next occurrence.