grafana / k6

A modern load testing tool, using Go and JavaScript - https://k6.io
GNU Affero General Public License v3.0
26.05k stars 1.27k forks source link

FormData performance (and findings on getting great performance) #3888

Open Codex- opened 3 months ago

Codex- commented 3 months ago

I've spent the past couple of weeks working with K6 trying to improve a test case where we have to send FormData-type requests to an API with a 'large' image (~140kb) and I've gone from peaking at ~30RPS with 40VU to >1000RPS with 30VU.

The issue largely seems to be around using Uint8Arrays and copying many to one 🤔

Preemptive apologies if I've misunderstood or misrepresented any inner workings of K6.

Problem

We have some requirements for testing against this particular endpoint

First results

Observations

Road to low CPU, high RPS

I experimented with a number of things that didn't yield better results, most of which were before I inspected the FormData polyfill and found the expensive operations

At this point I explored the inner workings of the FormData polyfill and found the way body builds the request body to be a primary point of suspicion, body(), is reasonably expensive.

Consider the case where we have two "parts", and calling body():

There's a lot going on here, take a breather, then lets continue.

From here I thought: Well, what can I do to forward calculate a lot of these repeated conversions and try reduce this burden on iterations when running our load tests?

I brought in parts of the FormData polyfill I could reuse, and trimmed what I did not need, and forward calculated the byte arrays where possible in a SharedArray:

const baseFormDataBoundary = '------RWWorkerFormDataBoundary';
const sharedFormData = new SharedArray('fd', function () {
  const contentBreak = toByteArr('\r\n');
  return [
    [...toByteArr(baseFormDataBoundary)], // have to expand these to number arrays as K6 does not like `Uint8Array`s
    [
      ...imgToFormDataByteArr(
        new Uint8Array(open('image.jpeg', 'b')),
        contentBreak
      ),
    ],
    [...contentBreak],
  ];
});

// Trimmed down `toByteArr` as I was already going to handle the binary data cases
function toByteArr(input: string): Uint8Array {
  const out = new Uint8Array(input.length);
  for (let i = 0; i < input.length; ++i) {
    out[i] = input.charCodeAt(i) & 0xff;
  }
  return out;
}

This meant that when it came to actually running iterations, I had already done a lot of the conversion and was able to simply join them before sending the data off

// VU Code
export default function (): void {
  const formDataParts = [
    // Boundary
    boundary,
    // Image
    sharedFormData[1],

    // Boundary
    boundary,
    // JSON
    detectionJson,

    // Footer
    boundaryClose,
  ];
  const formDataLength: number = formDataParts.reduce(
    (acc, curr) => acc + curr.length,
    0
  );
  const formDataBody = new Uint8Array(formDataLength);
  let offset = 0;
  for (const arr of formDataParts) {
    formDataBody.set(arr, offset);
    offset += arr.length;
  }

  const params = {
    headers: {
      'Content-Type': 'multipart/form-data; boundary=' + baseFormDataBoundary,
    },
  };

  http.post(destUrl, formDataBody, params);
}

With this I did get some improvements, but barely. From ~30RPS -> ~100RPS. Far from the >1000RPS I've been aiming for. It was clear, this wasn't going to work, maxing out the CPU usage was still a major problem:

image

Base64 encoding, manually stitching requests, and hitting 1000RPS

While experimenting with the byte data above we talked about before, and in an effort to store the image data as something that could be shared, I did notice one small but important detail: Base64 encoding and decoding is fast. Very fast.

So I asked myself "if I simply open an image, and encode it in base64, what does the CPU usage look like?" and the short answer was: "next to nothing". When inspecting the sources for K6, I can see that k6/encoding gives us a binding to the Go implementation for encoding/base64 which is perfect.

I ventured down the path of creating base64 encoded strings of the data we need so we can simply concatenate these strings before doing one big decode straight to a single buffer. Simply strategy but there are some challenges with this too, of course.

The Go implementation of base64 decoding does not support decoding concatenated base64 strings that contain base64 padding (=). To work around this, before encoding any strings I simply ensure they're the right length (n % 3 == 0) and pad with spaces, or 0s, where necessary.

const baseFormDataBoundary = '------RWWorkerFormDataBoundary'; // Just needs to be distinct from the body data, per the spec
const sharedFormData: string[] = new SharedArray('fd', function () {
  return [
    b64encode(imgToFormDataByteArr(new Uint8Array(open('image.jpeg', 'b')))), // image payload
    // These two strings are already the right length for b64 with no padding
    b64encode(`\r\n--${baseFormDataBoundary}\r\n`), // boundary MUST have NO surrounding whitespace, only newlines
    `--${baseFormDataBoundary}--\r\n`, // closing boundary (unencoded)
  ];
});

And here are some helpers I've written for anyone also embarking on this journey:

function calcPaddedLength(len: number): number {
  const remainder = len % 3;
  return remainder === 0 ? len : len + 3 - remainder;
}

function padStrToValidB64Len(input: string): string {
  const paddedLen = calcPaddedLength(input.length);
  if (paddedLen === 0) return input;
  return input.padEnd(paddedLen);
}

With the image binary data and boundary data now shared, I brought in the use of the setup lifecycle hook:

interface SetupData {
  imagePayloadB64: string;
}
export function setup(): SetupData {
  return {
    imagePayloadB64:
      sharedFormData[1] + // boundary
      sharedFormData[0] + // image
      sharedFormData[1], // boundary (for the json payload)
  };
}

Which we can now safely and cleanly consume from the VU code:

// VU Code
export default function (data: SetupData): void {
  const detectionJsonPayloadB64 = b64encode(
    padStrToValidB64Len(
      jsonPayloadContentDisposition +
        JSON.stringify({}) +
        '\r\n' +
        sharedFormData[2] // boundary footer
    )
  );

  const formDataBuffer = new Uint8Array(
    b64decode(data.imagePayloadB64 + detectionJsonPayloadB64)
  );

  http.post(destUrl, formDataBuffer.buffer, params);
}

There are obviously a lot of details omitted here, you'll have to work on making a compliant request body but the specification is reasonably clear but a small gotcha is that the data boundary cannot contain spaces before or after it, as it is interpreted as a whole line and must match the boundary specified in your header.

With these changes now in play, I'm seeing a significantly more performant load test with the large image payload, exceeding 1000RPS peaks.

And the CPU usage has dropped drastically:

image

olegbespalov commented 2 months ago

@Codex, thank you for sharing this, I believe it's precious to the community! :relaxed: