hasura / graphql-engine

Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
https://hasura.io
Apache License 2.0
31.2k stars 2.77k forks source link

Memory leak #9592

Open sergeimonakhov opened 1 year ago

sergeimonakhov commented 1 year ago

Version Information

Server Version: v2.22.1 CLI Version (for CLI related issue):

Environment

On-premises

What is the current behaviour?

We've been seeing memory leaks since version v2.19.x and higher

Upgrading to the latest version did not solve the problem, version v2.8.4 works correctly.

What is the expected behaviour?

To work without memory leaks

How to reproduce the issue?

  1. Upgrade to 2.19.x and higher.

Screenshots or Screencast

Screenshot 2023-04-17 at 09 41 33

OOM:

image

Please provide any traces or logs that could help here.

Any possible solutions/workarounds you're aware of?

Keywords

memory leak

its-a-feature commented 2 months ago

I'm on v2.43.0 and it's still happening. I see the memory just slowly increasing over time. I can see it already crashed recently compared to my other containers and is slowly increasing again to do the same thing:

CONTAINER NAME      STATE       STATUS          MOUNT   PORTS
mythic_documentation    running     Up 3 weeks (healthy)    local   8090/tcp -> 127.0.0.1:8090
mythic_graphql      running     Up 18 hours (healthy)   N/A 8080/tcp -> 127.0.0.1:8080
mythic_jupyter      running     Up 3 weeks (healthy)    local   8888/tcp -> 127.0.0.1:8888
mythic_nginx        running     Up 3 weeks (healthy)    local   7443
mythic_postgres     running     Up 3 weeks (healthy)    local   5432/tcp -> 127.0.0.1:5432
mythic_rabbitmq     running     Up 3 weeks (healthy)    local   5672/tcp -> 127.0.0.1:5672
mythic_react        running     Up 3 weeks (healthy)    local   3000/tcp -> 127.0.0.1:3000
mythic_server       running     Up 3 weeks (healthy)    local   17443/tcp -> 127.0.0.1:17443,

I'm using the FROM hasura/graphql-engine:latest.cli-migrations-v2 image. I currently have that specific container capped at 2gb because otherwise it just crashes the system

its-a-feature commented 2 months ago

Over the past two hours, the memory usage has climbed 500MB and keeps going up. It's going to hit my 2gb cap in the next few hours, so i'll see if I can follow the logs and drop them here. Maybe they'll have some insight into why we still have this memory issue

its-a-feature commented 2 months ago

errorlogs.json Yup, it OOM and crashed. It seemed to speed up in the resources used because it started out pretty slow, but didn't seem linear in the rate in which it grew. Here's some of the recent logs leading up to it crashing - maybe that provide something to you

KevinColemanInc commented 2 months ago

This is a shot in the dark, but have you monitored the number of connections/subscriptions to Hasura?

Maybe a client isn't closing a connection and hasura is just leaving it open?

its-a-feature commented 2 months ago

Considering how inconsistent it appears on my end, I'm not sure if that's the issue, but I'm happy to check! What's the best way to monitor from Hasura's perspective the active connections?

maxpain commented 2 weeks ago

Hasura with a lot of websocket clients:

Screenshot 2024-11-15 at 12 49 11

Without websocket clients:

Screenshot 2024-11-15 at 12 49 20
maxpain commented 1 week ago

WTF

telegram-cloud-photo-size-2-5323372308968104503-y

maxpain commented 1 week ago

I've found the problem. Hasura was connected to Postgres using PGBouncer. I connected it directly to Postgres and the memory leaks are gone!

image
its-a-feature commented 1 week ago

is PGBouncer something you added, or is that included with Hasura in some way that we can toggle?

maxpain commented 1 week ago

is PGBouncer something you added, or is that included with Hasura in some way that we can toggle?

It's not included in Hasura. It's a connection pooler for postgres

maxpain commented 4 days ago

It happened again, even without pgbouncer :( There is no traffic at all. How is it even possible? Not only is memory leaking, but also CPU usage! I have no idea how to debug this.

Screenshot 2024-11-28 at 17 38 13 Screenshot 2024-11-28 at 17 38 09