Memory and CPU leak in agent and service

graphprotocol / indexer

Graph Protocol indexer components and infrastructure

MIT License

241 stars 130 forks source link

Memory and CPU leak in agent and service #41

Open chriswessels opened 4 years ago

chriswessels commented 4 years ago

Hey there,

I'd been running release sha-54d4905 on Kubernetes for a few days before upgrading to a new release, and noticed a huge drop in memory usage for the indexer-agent and indexer-serivce.

Looking back at memory usage for the deployments:

Screenshot 2020-09-01 at 22 23 40

Screenshot 2020-09-01 at 22 25 40

Both seem to have a fairly consistent and linear growth in memory usage of time.

Let me know if there are any further details I can provide!

Chris

chriswessels commented 4 years ago

Looks like it's leaking CPU cycles too: Screenshot 2020-09-01 at 22 33 40

chriswessels commented 3 years ago

Looks like indexer-agent and indexer-service still have a memory leak problem. I've been using a Kubernetes CronJob to automatically restart them every day to keep memory usage under control. Here's a gist for anyone that wants to do the same: https://gist.github.com/chriswessels/8271f82a0ae7342d7d0822ea1e796246

aasseman commented 2 years ago

I'm also observing this issue on v0.18.6. However I've found a good way to accelerate it: send lots of GraphQL queries to the indexer-agent management endpoint. I've also noticed that the queries would take longer to execute as memory usage grows.

However I am not sure if certain types of queries are worse than others. Read below.

Context As @fordN and @chriswessels already know, I'm currently running experiments that require constantly updating the Agora models. In this particular case, I am updating the model variables for each subgraph every 3 minutes. In the plot below, every drop is the OOM killer killing the indexer-agent. The frequency increased around May 10th, when I deployed my experiment to the indexer.

aasseman commented 2 years ago

Update: The problem is still present on v0.19.2.

kaiwetlesen commented 2 years ago

Update: immediately preceding the OOM crash, queries begin failing. This causes a drop in quality of service. out-of-memory-crash-grafana indexer-service-memory-leak

aasseman commented 1 year ago

Update: the problem is still present on v0.20.12

alex-pakalniskis commented 7 months ago

Figment appears to be experiencing this issue too (shared with me on March 14, 2024)