Open sergeimonakhov opened 1 year ago
I'm on v2.43.0 and it's still happening. I see the memory just slowly increasing over time. I can see it already crashed recently compared to my other containers and is slowly increasing again to do the same thing:
CONTAINER NAME STATE STATUS MOUNT PORTS
mythic_documentation running Up 3 weeks (healthy) local 8090/tcp -> 127.0.0.1:8090
mythic_graphql running Up 18 hours (healthy) N/A 8080/tcp -> 127.0.0.1:8080
mythic_jupyter running Up 3 weeks (healthy) local 8888/tcp -> 127.0.0.1:8888
mythic_nginx running Up 3 weeks (healthy) local 7443
mythic_postgres running Up 3 weeks (healthy) local 5432/tcp -> 127.0.0.1:5432
mythic_rabbitmq running Up 3 weeks (healthy) local 5672/tcp -> 127.0.0.1:5672
mythic_react running Up 3 weeks (healthy) local 3000/tcp -> 127.0.0.1:3000
mythic_server running Up 3 weeks (healthy) local 17443/tcp -> 127.0.0.1:17443,
I'm using the FROM hasura/graphql-engine:latest.cli-migrations-v2
image. I currently have that specific container capped at 2gb because otherwise it just crashes the system
Over the past two hours, the memory usage has climbed 500MB and keeps going up. It's going to hit my 2gb cap in the next few hours, so i'll see if I can follow the logs and drop them here. Maybe they'll have some insight into why we still have this memory issue
errorlogs.json Yup, it OOM and crashed. It seemed to speed up in the resources used because it started out pretty slow, but didn't seem linear in the rate in which it grew. Here's some of the recent logs leading up to it crashing - maybe that provide something to you
This is a shot in the dark, but have you monitored the number of connections/subscriptions to Hasura?
Maybe a client isn't closing a connection and hasura is just leaving it open?
Considering how inconsistent it appears on my end, I'm not sure if that's the issue, but I'm happy to check! What's the best way to monitor from Hasura's perspective the active connections?
Hasura with a lot of websocket clients:
Without websocket clients:
WTF
I've found the problem. Hasura was connected to Postgres using PGBouncer. I connected it directly to Postgres and the memory leaks are gone!
is PGBouncer something you added, or is that included with Hasura in some way that we can toggle?
is PGBouncer something you added, or is that included with Hasura in some way that we can toggle?
It's not included in Hasura. It's a connection pooler for postgres
It happened again, even without pgbouncer :( There is no traffic at all. How is it even possible? Not only is memory leaking, but also CPU usage! I have no idea how to debug this.
Version Information
Server Version: v2.22.1 CLI Version (for CLI related issue):
Environment
On-premises
What is the current behaviour?
We've been seeing memory leaks since version v2.19.x and higher
Upgrading to the latest version did not solve the problem, version v2.8.4 works correctly.
What is the expected behaviour?
To work without memory leaks
How to reproduce the issue?
Screenshots or Screencast
OOM:
Please provide any traces or logs that could help here.
Any possible solutions/workarounds you're aware of?
Keywords
memory leak