FormData performance (and findings on getting great performance)

I've spent the past couple of weeks working with K6 trying to improve a test case where we have to send FormData-type requests to an API with a 'large' image (~140kb) and I've gone from peaking at ~30RPS with 40VU to >1000RPS with 30VU.

The issue largely seems to be around using Uint8Arrays and copying many to one 🤔

Preemptive apologies if I've misunderstood or misrepresented any inner workings of K6.

Problem

We have some requirements for testing against this particular endpoint

Request must be compliant with RFC7578
- XML Request specification for multipart/form-data
- https://www.rfc-editor.org/rfc/rfc7578#section-4.4
- This is what you get for free when using FormData
  - This is awesome if you don't have large data / payloads to join
Payload composed of multiple parts
- A fairly humble json payload
- A 'large' image payload ranging from 100-200kb

First results

Running on Grafana Cloud
Using K6 FormData polyfill
- ~30RPS
- This is in contrast to ~800RPS with a 4kb image with the same VU count.
- 100% CPU
- Grafana Cloud was angry and sad, throwing warnings around CPU usage
- Memory never went about 5%

Observations

Initially I thought we might be CPU bound on IO operations with open being in the init part of the lifecycle, however this was not the case.
Uint8Array and buffers in general are very poor performing in K6
- FormData polyfill essentially converts every entry/field to a byte array, then populates a new Uint8Array
- This phase is where CPU spikes to 100%
SharedArray only accepting strings, numbers, and other primitives but not buffers or any kind was very difficult to work with
Memory usage was never an issue with this

Road to low CPU, high RPS

I experimented with a number of things that didn't yield better results, most of which were before I inspected the FormData polyfill and found the expensive operations

SharedArray
- A number of attempts were made to forward load binary data and store it here in various forms to try satisfy the supported types
The experimental fs APIs
Forward generating / building as much as possible in the init and setup lifecycles.

At this point I explored the inner workings of the FormData polyfill and found the way body builds the request body to be a primary point of suspicion, body(), is reasonably expensive.

Consider the case where we have two "parts", and calling body():

Declare body as an empty array
Prebuild a byte array for the RFC7578 compliant data boundary, boundary
Stringified JSON part jsonStr
- Push the boundary to body
- Concatenate string composing the Content-Disposition, cd
- Convert cd to a byte array
- Push cd to body
- Convert jsonStr to a byte array
- Push jsonStr to body
- Convert \r\n to a byte array
- Push \r\n to body
Repeat for the most part for an image
Push a closing boundary to body
Create a new Uint8Array from the body array of byte arrays
- This copies all data to a new collection in linear time
Return a reference to the buffer

There's a lot going on here, take a breather, then lets continue.

From here I thought: Well, what can I do to forward calculate a lot of these repeated conversions and try reduce this burden on iterations when running our load tests?

I brought in parts of the FormData polyfill I could reuse, and trimmed what I did not need, and forward calculated the byte arrays where possible in a SharedArray:

const baseFormDataBoundary = '------RWWorkerFormDataBoundary';
const sharedFormData = new SharedArray('fd', function () {
  const contentBreak = toByteArr('\r\n');
  return [
    [...toByteArr(baseFormDataBoundary)], // have to expand these to number arrays as K6 does not like `Uint8Array`s
    [
      ...imgToFormDataByteArr(
        new Uint8Array(open('image.jpeg', 'b')),
        contentBreak
      ),
    ],
    [...contentBreak],
  ];
});

// Trimmed down `toByteArr` as I was already going to handle the binary data cases
function toByteArr(input: string): Uint8Array {
  const out = new Uint8Array(input.length);
  for (let i = 0; i < input.length; ++i) {
    out[i] = input.charCodeAt(i) & 0xff;
  }
  return out;
}

This meant that when it came to actually running iterations, I had already done a lot of the conversion and was able to simply join them before sending the data off

// VU Code
export default function (): void {
  const formDataParts = [
    // Boundary
    boundary,
    // Image
    sharedFormData[1],

    // Boundary
    boundary,
    // JSON
    detectionJson,

    // Footer
    boundaryClose,
  ];
  const formDataLength: number = formDataParts.reduce(
    (acc, curr) => acc + curr.length,
    0
  );
  const formDataBody = new Uint8Array(formDataLength);
  let offset = 0;
  for (const arr of formDataParts) {
    formDataBody.set(arr, offset);
    offset += arr.length;
  }

  const params = {
    headers: {
      'Content-Type': 'multipart/form-data; boundary=' + baseFormDataBoundary,
    },
  };

  http.post(destUrl, formDataBody, params);
}

With this I did get some improvements, but barely. From ~30RPS -> ~100RPS. Far from the >1000RPS I've been aiming for. It was clear, this wasn't going to work, maxing out the CPU usage was still a major problem:

Base64 encoding, manually stitching requests, and hitting 1000RPS

While experimenting with the byte data above we talked about before, and in an effort to store the image data as something that could be shared, I did notice one small but important detail: Base64 encoding and decoding is fast. Very fast.

So I asked myself "if I simply open an image, and encode it in base64, what does the CPU usage look like?" and the short answer was: "next to nothing". When inspecting the sources for K6, I can see that k6/encoding gives us a binding to the Go implementation for encoding/base64 which is perfect.

I ventured down the path of creating base64 encoded strings of the data we need so we can simply concatenate these strings before doing one big decode straight to a single buffer. Simply strategy but there are some challenges with this too, of course.

The Go implementation of base64 decoding does not support decoding concatenated base64 strings that contain base64 padding (=). To work around this, before encoding any strings I simply ensure they're the right length (n % 3 == 0) and pad with spaces, or 0s, where necessary.

const baseFormDataBoundary = '------RWWorkerFormDataBoundary'; // Just needs to be distinct from the body data, per the spec
const sharedFormData: string[] = new SharedArray('fd', function () {
  return [
    b64encode(imgToFormDataByteArr(new Uint8Array(open('image.jpeg', 'b')))), // image payload
    // These two strings are already the right length for b64 with no padding
    b64encode(`\r\n--${baseFormDataBoundary}\r\n`), // boundary MUST have NO surrounding whitespace, only newlines
    `--${baseFormDataBoundary}--\r\n`, // closing boundary (unencoded)
  ];
});

And here are some helpers I've written for anyone also embarking on this journey:

function calcPaddedLength(len: number): number {
  const remainder = len % 3;
  return remainder === 0 ? len : len + 3 - remainder;
}

function padStrToValidB64Len(input: string): string {
  const paddedLen = calcPaddedLength(input.length);
  if (paddedLen === 0) return input;
  return input.padEnd(paddedLen);
}

With the image binary data and boundary data now shared, I brought in the use of the setup lifecycle hook:

interface SetupData {
  imagePayloadB64: string;
}
export function setup(): SetupData {
  return {
    imagePayloadB64:
      sharedFormData[1] + // boundary
      sharedFormData[0] + // image
      sharedFormData[1], // boundary (for the json payload)
  };
}

Which we can now safely and cleanly consume from the VU code:

// VU Code
export default function (data: SetupData): void {
  const detectionJsonPayloadB64 = b64encode(
    padStrToValidB64Len(
      jsonPayloadContentDisposition +
        JSON.stringify({}) +
        '\r\n' +
        sharedFormData[2] // boundary footer
    )
  );

  const formDataBuffer = new Uint8Array(
    b64decode(data.imagePayloadB64 + detectionJsonPayloadB64)
  );

  http.post(destUrl, formDataBuffer.buffer, params);
}

There are obviously a lot of details omitted here, you'll have to work on making a compliant request body but the specification is reasonably clear but a small gotcha is that the data boundary cannot contain spaces before or after it, as it is interpreted as a whole line and must match the boundary specified in your header.

With these changes now in play, I'm seeing a significantly more performant load test with the large image payload, exceeding 1000RPS peaks.

And the CPU usage has dropped drastically:

grafana / k6