grafana / k6

A modern load testing tool, using Go and JavaScript - https://k6.io
GNU Affero General Public License v3.0
23.92k stars 1.2k forks source link

SharedArray: Bad performance with big items #3237

Open mherwig opened 11 months ago

mherwig commented 11 months ago

Brief summary

Hi,

I'm currently testing a scenario where different payload sizes are getting used with a post request (1kb, 100kb, 300kb, 600kb). For that I open a file containing dummy text within the SharedArray function and write the text to the arrays first element which will be accessed again when creating the requests body for the post request. While with 1kb I could reach >200 requests/s, it was ~20 requests/s with 100kb and only ~7 requests/s with 300kb. Each time I ensured K6 had enough memory and CPU. I exchanged the SUT with a simple echo server written in golang and got the same result to rule out the SUT. A bottleneck in network can also be ruled out, since with the wrk benchmark tool I could reach >1000 requests/s for requests with 100kb in the same environment.

I there any known issue/inefficiency with big payloads for requests or am I misunderstanding something?

Best regards

Mike

k6 version

v0.45.0

OS

alpine:3.16

Docker version and image (if applicable)

No response

Steps to reproduce the problem

  1. Create test that sends a post request with different body sizes against a simple echo server
  2. Test with smaller payload (e.g. 1kb dummy text as part of a json payload)
  3. Test with bigger payload (e.g. 100kb dummy text as part of a json payload)
  4. Compare requests/s for each run

Expected behaviour

No big impact of payload size on loadtest results.

Actual behaviour

Loadtest result is way more worse with bigger payloads than with smaller ones.

codebien commented 11 months ago

Hi @mherwig, thanks for your report. We have internally confirmed a performance degradation in case a single item in the SharedArray has a significant size.

We will update here when a patch will be available.

mstoykov commented 11 months ago

Hi sorry for the slow reply I ended up trying to finish something else, and I am still working on it.

Let's preface this with me not being for this to land in v0.46.0 as we release in a little over a week. This change feels a big too big for last week of development - likely one that will need a lot more tests than what I ran by hand.

A quick "benchmark" with

import http from "k6/http";
import { SharedArray } from "k6/data";
import { randomString } from "https://jslib.k6.io/k6-utils/1.2.0/index.js";

const size = __ENV.SIZE;
if (!size) {
    throw "SIZE needs to be defined";
}
const myData = new SharedArray("data", function() {
    let result = [];
    for (let i = 0; i < 10; i++) {
        let data = randomString(size);
        result[i] = data;
    }
    return result;
});

export const options = { vus: 100, duration: "10s" };
export default () => {
    const data = myData[__ITER % myData.length];
    // http.post("https://httpbin.test.k6.io/post", data);
};

Give the following results:

SIZE:        Iterations/s
100          129706.606827/s
1000         14870.06481/s
10000        1502.585622/s
100000       138.556129/s

As we can see it gets worse and worse with more elements and the main place where this spends time turned out to be the part where k6 freezes the returned object. Instead of the part where it parses it. image by 66x or so.

It looks like that specifically for Strings we are going through each "character" and freeze it - which is both not needed and time-consuming with all the other stuff happening.

IMO we should not be going through characters and changing so that we do not go through the object keys but through its prototype makes this way faster. (draft PR with changes)

old 100000    138.556129/s
new 100000   8633.335171/s

Which is like 80x faster :tada: .

This also seems to not break anything including for example nested arrays.

But to be honest we likely should think about this more and try to figure out if we can spend even less time in that function.

I will not be able to work on this in the following week or so, so if somebody wants to try to take over it and do some more research, tests and possibly benchmarks I will be happy.

p.s. Historically SharedArray was meant for smaller objects - mostly password and user pairs or some other kind of small data that you have a lot of. But there is no reason for it to not work with big one as long as it keeps the same functionality for the other use case.

shaochun0530 commented 11 months ago

Hi @mstoykov, How about not using SharedArray? export default () => { const data = randomString(1024); // http.post("https://httpbin.test.k6.io/post", data); }; Cuz in my case, i didn't use shared array to store my payload, just generate data in my default function. and find out with large payload of size, k6 will be slow. will this patch version able to deal with it?

mstoykov commented 11 months ago

@shaochun0530, if you are always generating random data - it is likely better to just generate it in time.

The idea of SharedArray is mostly to save memory by having a single copy of data that you need to load from a file (usually).

It does this by having 1 copy of the data as json and each time you access an element it just unmarshals that element adn returns it.

This works great with things usch as user/password pairs where you might need them to be predefined and registered.

But for random data that you just generate, I would expect that there will be no real difference or SharedArray copying will be close to the price of generating it again.

shaochun0530 commented 11 months ago

However with randomly generating data, i find out the same result as same as the sharedarray does. K6 slows down with the large size of payload

mstoykov commented 11 months ago

@shaochun0530 see #2311 and #2974 for the ongoing work.

If you have any more usage questions, please open a community forum thread or a separate issue, as to not keep posting stuff not directly related to this one.

ichasepucks commented 10 months ago

@mstoykov do you have a branch I can try this out with? We already build K6 ourselves in order to add some extensions so this wouldn't be a big deal for us. At this point, the performance is so poor that I'm considering moving all of our static data to redis. However, that's a lot more work than if your fix works! Thanks!

Edit: Found your PR and branch https://github.com/grafana/k6/pull/3245 and built with it. Still unusable for us. If I access the shared array I peg all 16 cores 100% and can barely do 25K RPM. If I stub out the values I use 0.5 cores and can easily breeze through 100K RPM. We're storing a fairly large block of JSON data. The element size is only 4 but each element is a few hundred K bytes. We quickly OOM if we don't use shared array. Bummer.

mstoykov commented 10 months ago

Hi @ichasepucks. you might be hitting more https://github.com/grafana/k6/issues/2311 than anything else. If the difference is that you have 4 elements the difference with or without sharearray will be 4x in memory usage, which while not small likely is nto enough to OOM you on it's own.

stub out the values

I guess that means that they were empty? In that case I expect you are hitting #2311 even harder as at 100k and let's say 500kb responses the current problems will likely make 2 copies on top (maybe more depending on how you upload ).

Even without doubling 100k RPS will be 47GB of memory for just the copy being uploaded. You can do back of the napkin math if there is 1 more copy or if the filesize is smaller/bigger and the RPS changes.

If something like 3 copies with the expect RPS is within half your memory budget (GC languages notoriusly use twice as much memory, see this docs) you can do an old ~hack~workaround and load all hte files in the intiial VU (with __VU==0) and only 1 of the files in all the others (__VU%4).

This will have 1 copy for the one file in each VU and then however many copies you need for the upload itself. In practice the only two downsides to this approach against SharedArray are:

  1. you can only upload one of the files in each of the VUs
  2. you need to write the arguably more complicated code to load only 1 file in each VU

the upside in your case is that you skip the SharedArray deserilization which arguably is the much bigger problem in your case.

I still expect #2311 will be a problem.

p.s. please for more usage discsussion open a community forum thread