grafana / k6

A modern load testing tool, using Go and JavaScript - https://k6.io
GNU Affero General Public License v3.0
25.67k stars 1.26k forks source link

Memory leak of a simple test #3955

Open Zywertos opened 1 month ago

Zywertos commented 1 month ago

Brief summary

K6 is leaking memory even when running a test that doesn't perform any actual actions. I ran a test with 1000 requests per second and noticed that my RAM usage slowly increased by about 1MB every few seconds. To amplify this behavior, I ramped it up to 100k requests per second. As a result, K6's memory usage went from around 500MB to 2GB after 17 minutes of running the test.

I want to run tests on my app, and I need them to last around 24 hours. The tests are simple, but after a few hours, K6 starts using up 12GB of RAM, which ends up causing a crash...

Am I missing something? I don’t think this is the intended behavior.

k6 version

k6.exe v0.53.0 (commit/f82a27da8f, go1.22.6, windows/amd64)

OS

Windows 11

Docker version and image (if applicable)

No response

Steps to reproduce the problem

Just run this test for some time:


export const options = {
    "scenarios": {
        "contacts": {
            "executor": "constant-arrival-rate",

            "duration": "24h",

            "rate": 100000,
            "timeUnit": "1s",

            "preAllocatedVUs": 1000
        }
    }
}

export default async function () {}

Expected behaviour

RAM should stabilize after a while and stop allocating more.

Actual behaviour

There's RAM leak.

Zywertos commented 1 week ago

What do you think @joanlopez ? Am i missing something here or is there actually a leak?

taylorflatt commented 1 week ago

FWIW I've seen something like this before and I tried recreating this on my end. I was able to easily see what @Zywertos is specifically calling out. You can especially see it if you include the number of VUs and rate by an order of magnitude. The number of iterations is absolutely insane when you do this and without looking at the code it could make some sense that memory is climbing due to metrics collection and how it is maintained. I'll try to dig into it a bit more if I have a chance this evening.

I decided to also test with a simple HTTP call in the function without a payload either direction (aside from a 200). The rate was slower by virtue of actually having to transact. The memory usage was vastly slower than unthrottled.

Another observation is that the non-http calling test was definitely more CPU throttled than anything. But again, this makes sense with the number of iterations (millions to reach this on my end) that are occurring and there is practically no wait involved at all.

I get similar results in linux on:

Zywertos commented 1 week ago

Yesterday I ran some tests to figure out how much RAM I need to handle a 12-hour test run:Image These tests actually hit my API. During the process, I logged 90,971,215 HTTP requests. The API itself is pretty lightweight, but it has to be since it’s gotta handle a ton of incoming traffic and stay stable 24/7. That's why we're running these tests to make sure it’s not leaking memory or having any other long-term issues.

As you can see, the RAM usage shoots up, and after about 13 hours, we hit a crash because we only had 32GB of RAM available.

joanlopez commented 1 week ago

Hey @Zywertos, @taylorflatt,

I got a couple of Go (memory) profiles from a 10-min running tests based on your example, and I see no main memory allocations other than the ones related with TrendSink, which yeah, keeps growing over time (see #3618, for instance, or any other one related with TrendSink and memory allocations).

You can find more details, and a possible workaround at https://github.com/grafana/k6/issues/2367#issuecomment-1028866630. Indeed, I have run your example with the suggested workaround and the memory allocations are effectively much lower.

Could you check that and confirm it works for you as well, please? If so, I'll proceed to close this issue. If you're looking for a solution other than the workaround, I'd suggest you keeping an eye on the aforementioned open issue (#2367).

Thanks! 🙇🏻

Zywertos commented 1 week ago

@joanlopez alright, thanks for clearing that up. So, it's not a memory leak, just a ton of metrics being stored in RAM... I'll give your suggestions a shot.