openimsdk / open-im-server

IM Chat ChatGPT
https://openim.io
Apache License 2.0
14.14k stars 2.5k forks source link

Bug: memory leaking in MSGGATEWAY #2176

Open Sylariam opened 7 months ago

Sylariam commented 7 months ago

What happened?

This monday I upgraded server side to v3.6.0 using Helm Chart. For some reason MSGGATEWAY memory usage suddenly spiked, with no signs of stopping. This directly led to our IM service becoming unavailable, which coincided with our peak business hours. The quantity of error logs also increased during that period, including entries such as "read tcp :80->:5644: i/o timeout" and "websocket: close 1006 (abnormal closure): unexpected EOF".

MSGGATEWAY memory increased all the way to 11GiB. No stopping before our manual intervention. img_v3_029q_7c008829-eb5a-43d8-8146-90a9fc4b128g

OpenIM's official Grafana dashboard stats: img_v3_029q_4972ff6b-d852-47dc-b3c6-edb8a0d6c76g

Online users: image

What did you expect to happen?

no memory leaking plz

How can we reproduce it (as minimally and precisely as possible)?

Helm charts deploy v3.6.0

Anything else we need to know?

No response

version

```console $ {name} version # paste output here ```

Cloud provider

OS version

```console # On Linux: $ cat /etc/os-release # paste output here $ uname -a # paste output here # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Install tools

Sylariam commented 7 months ago

Setting aside the issue at hand, the current Kubernetes deployment lacks resource limitations, which would lead to unlimited consumption of cluster resources if any similar problems arise. This needs to be addressed.

pokid commented 7 months ago

i got this problem too.

skiffer-git commented 1 week ago

Let's wait to use it until after our Kubernetes deployment goes live.

OpenIM-Robot commented 1 week ago

Bot detected the issue body's language is not English, translate it automatically. πŸ‘―πŸ‘­πŸ»πŸ§‘β€πŸ€β€πŸ§‘πŸ‘«πŸ§‘πŸΏβ€πŸ€β€πŸ§‘πŸ»πŸ‘©πŸΎβ€πŸ€β€πŸ‘¨πŸΏπŸ‘¬πŸΏ


Let's wait to use it until after our Kubernetes deployment goes live.