Closed jaffemd closed 1 week ago
Hey there! Thanks for reporting. For me to debug this in depth I'd need to a bit more about the test env you have and create a benchmark that's replicating the behaviour in order for to pin-point the issue.
Can you tell me:
@enisdenjo Thank you for the quick reply!
- Which Node version are you using?
We're running the gateway on a docker container in a kubernetes pod. The docker container is running node 20.14.0.
- Does the traffic change for the 15-20min spike, what is happening during that time?
- Is there consistently a spike every 15-20mins or at random times? Also during low traffic?
- How are you performing the test? Constant VUS over time or are you using some sort of real traffic?
This is with real traffic. It's overall constant, but we do get generally lower traffic overnight. The spikes are consistently every 15-20 minutes, but not exactly.
We're running the gateway on a docker container in a kubernetes pod. The docker container is running node 20.14.0.
We had some fights with Node and memory spikes in the past. Am wondering whether that's the case here too? Can we start by updating the Node in the container to the upcoming LTS (starting tomorrow) v22.10.0?
It's overall constant, but we do get generally lower traffic overnight. The spikes are consistently every 15-20 minutes, but not exactly.
Spikes are also during lower traffic?
Upgrading node to 22.10.0
fixed our issue. Thank you!
We recently upgraded from
graphql-mesh
v0 to@graphql-mesh/compose-cli
v1 + Hive Gateway as recommended by the migration guide.Here are our relevant dependencies and versions:
Here's our config:
Before V1, memory usage was a plateau and stable. After upgrading to use hive gateway, we immediately observed unstable memory utilization.
Zooming in, every 15 to 30 minutes, there is a sharp spike in memory.
Our only clue was this release note in mesh v0.98.7 that referenced memory leaks from plugins.
We're using a mix of homegrown plugins that perform various functions such as datadog tracing and graphql-armor vendor plugins. We ran a short experiment to turn them all off and didn't observe any memory spikes:
To isolate against the possibility of the content of our plugins being the issue, we created a barebones empty plugin to see if we still saw a memory spike just with that, and we did. Using the below config, with just a plugin that hooks into
onFetch
andonExecute
, we still saw a memory spike after around 20 minutes.This made it seem clear to us that there must be some memory leakage going on within the plugin infrastructure that could be similar to the issue referenced in the graphql-mesh release notes.