k6 memory leak using imports

Dasha27 commented 8 months ago

Brief summary

k6 consumes a lot of memory when using imports. Any non-empty import affects the test which causes a problem with running stability tests.

k6 version

k6 v0.46.0, go1.21.0

OS

Debian 11

Docker version and image (if applicable)

No response

Steps to reproduce the problem

We use several scripts with imports like this:

import http from 'k6/http';
import exec from 'k6/execution'
import {check, fail, sleep, group} from 'k6';
import {sha512} from 'k6/crypto';
import {getSession} from '../start_session.js';
import {headers, env, testType, testConfig, trendStats, testTags} from "../constants.js"
import {name, config} from "../configuration.js"
import {generatePayload1, generatePayload2, generatePayload3, generatePayload4} from "../payload.js";
import {function1, function2, function3} from "../functions.js";

The test with such imports consumes all the memory of the load generator (16 GB) within 1 hour of a fix-load test with 5000 VUs.

Here in file you can see the memory consumption for 500 VUs k6_mem_deduplicated.log

Expected behaviour

The test should run without any memory leaks even on the high load and using different imports.

Actual behaviour

The test consumes all the memory within a quite short period. It happens even without using any xk6 extensions and writing any logs/artifacts. Using only one file without any imports (except k6 libraries) works fine and without memory leaks.

joanlopez commented 8 months ago

Hi @Dasha27,

Thanks for details. I'll try to reproduce it. But meanwhile, would you be able to to share a minimal reproducible example? I see a lot of custom files (e.g. start_session.js, constants.js, configuration.js, payload.js, etc), and I guess whatever that's there could make the difference.

Thanks!

Dasha27 commented 8 months ago

Hi @joanlopez,

Sure, here are some examples of the custom files. start_session.txt constants.txt functions.txt payload.txt configuration.txt

joanlopez commented 8 months ago

Sorry @Dasha27, but I still cannot see what the main (default) test function actually does in your case, so although I appreciate you shared the helper files imported (what you pointed as what seems to be reason of huge memory consumption), I'd need to at least have a clue of how the test looks like (from the initial message I can only see the list of imports), to understand how those 5000VUs will behave, and what could be causing the memory consumption (ideally try to reproduce it and profile).

So, please could you shed some light? Thanks!

Dasha27 commented 8 months ago

Sure, sorry for misunderstanding, here is the main file main.txt

joanlopez commented 8 months ago

Hi @Dasha27,

I've spent some time trying to reproduce the same behavior (distinct memory consumption with/out imports) with no luck. So, at this point, I'd like to give you two suggestions. Either:

Try to extract some memory profiles from your high memory consumption executions (see here how to enable profiling endpoints in k6), so we can identify what pieces are consuming more memory and try to reason why.
Try to strip out some of your test bits, so you will either:
- Identify what specific piece is causing the high memory consumption.
- End up having a much simpler test that you can share with us, so it's easier for us to reproduce your casuistic (ideally, just a very few lines, even if spread across different files).

Honestly, I've spent some time trying to reproduce it with the bits you shared so far, but I had no luck, and honestly I struggled a bit because they contain many specific details custom to your environment/scenario. However, I have to admit that after a quick look, I haven't detected yet any red flag that might be causing such a high memory consumption as you mention. So, still curious.

Also, note that high memory consumption for certain large and long tests might be expected. You can look for some references along these benchmarks, which are a bit outdated, but shouldn't differ much for most recent releases. Additionally, if you're curious about related conversations, you can take a look at the discussion we recently had at https://github.com/grafana/k6/issues/3498, and what's described in https://github.com/grafana/k6/issues/2367 (which is still tbd, btw).

Thanks!

metaturso commented 1 month ago

I'm in a similar predicament. In my case, the test loads a large (30-40MB) CSV file and immediately runs out of memory.

I initially thought parsing large amounts of CSV data was the problem. It wasn't. Then suspected some funky business was going on with SharedArray (https://github.com/grafana/k6/issues/3237). But it wasn't that either.

I kept removing code from the script until I got a minimal scenario that eats up 64GB of memory in about a couple minutes:

import { data } from "large-csv-file-now-converted-to-javascript.js";

export function setup() {
    return {};
}

export const options = {
    /* cloud */
    scenarios: {
        leak: {
            executor: "ramping-arrival-rate",
            exec: "leak",
            timeUnit: "1m",
            startRate: 288,
            preAllocatedVUs: 400,
            maxVUs: 400,
            stages: [
                {target: 3378, duration: "10m"},
                {target: 3378, duration: "175m"},
                {target: 0, duration: "5m"},
            ],
        },
    }
};

export function leak() {
    // empty body.
}

This is the data file. The objects have 3 fields.

// This is a 30MB worth of captured request data.

// The original file was a CSV parsed with papaparse.
// Then it was a JSON file, but it always exported { default: {} } without data...
// Then we got to this file.

export const data = [
    {
        path: "/",
        query: "",
        method: "GET",
    },
    { /* ... */ },
    { /* ... */ },
      /* ... */
    { /* ... */ },
    { /* ... */ },
];

joanlopez commented 1 month ago

Hi @metaturso,

From k6 docs, you can read:

In general, all external modules added to a test project have a negative impact on performance, as they further increase the memory footprint and CPU usage.

Usually, this is not a big problem as each application only allocates these resources once. In k6, however, every VU has a separate JavaScript virtual machine (VM), duplicating the resource usage once each.

So, looking at the example you provided, I think that huge memory usage is just expected, as these ~35MB would be copied over each VU (cause each VU is an isolated JS runtime, and the data variable needs to be set on each of them).

In fact, I just profiled the memory usage of such example, and the consumption is around ~14GB, which matches with the rough math: 400 VUs x 35MB. The memory usage from the OS standpoint (process) is much higher (around ~50GB), but that's probably because Go's garbage collector isn't very optimal for such use scenario.

That said, what I'd recommend you to avoid such large memory consumption is:

Use SharedArray, designed precisely to handle such scenarios.
Make data be a function that returns the data, or use a data file (CSV, JSON, etc) and open it with k6/experimental/fs.open.

For instance, note the difference in your script from:

// script.js
import { data } from "large-csv-file-now-converted-to-javascript.js";

export function setup() {
    ...
}

export const options = {
   ...
};

export function leak() {
    // empty body.
}

// large-csv-file-now-converted-to-javascript.js
export const data = [
    {
        path: "/",
        query: "",
        method: "GET",
    },
    { /* ... */ },
    { /* ... */ },
      /* ... */
    { /* ... */ },
    { /* ... */ },
];

vs

// script.js
import { getData } from "large-csv-file-now-converted-to-javascript.js";
import { SharedArray } from 'k6/data';

const data = new SharedArray('data', function () {
  return getData();
});

export function setup() {
    ...
}

export const options = {
   ...
};

export function leak() {
    // empty body.
}

// large-csv-file-now-converted-to-javascript.js
export function getData() return [
        {
            path: "/",
            query: "",
            method: "GET",
        },
        { /* ... */ },
        { /* ... */ },
          /* ... */
        { /* ... */ },
        { /* ... */ },
    ];
}

Please, note that only modifying your example to use SharedArray isn't enough, as what you want is also avoiding the memory allocation of data for each VU, which would remain if you import { data } from file, directly as raw data.

I hope that helps! @Dasha27 could you confirm that would also help in your case (which I guess is a more complex scenario of what @metaturso shared)? If so, I'd suggest to close the issue, as I'd mark what's described as expected behavior and just consider what I suggested above as solution.

Thanks! 🙇🏻

PS: Thanks @metaturso for providing such an easy to reproduce example! 🙌🏻

metaturso commented 1 month ago

@joanlopez: Thank you so much for looking into this and for debugging the scenario.

I also expected to see memory allocations in the region of 15GB. However, my concern was that k6 never actually stopped allocating until both memory and swap file were completely filled instead of staying below 20GB.

I didn't realise this might have been an issue at a lower level. I'm happy to call this working as intended and blame it on Go's garbage garbage collection 😅

Regarding the use of SharedArray, the reason I tried using an import to load the data is that large elements in a SharedArray also tend to leak memory significantly, as described in https://github.com/grafana/k6/issues/3237.

Fortunately, in my case, I can split my data into chunks small enough that don't clog the SharedArray 😃

grafana / k6