Add size limiter to `Body#formData` method

lucacasonato commented 1 year ago

Right now there is no way to call Body#formData on arbitrary user input without potentially causing an OOM. It would be great if we could constrain the formData parser so it stops parsing if a body exceeds a certain size.

Example API:

const form = req.formData({ maxSize: 10_000_000 /* 10MB */ })

If the body size exceeds the max size, a QuotaExceededError DOMException should be thrown.

This might make sense for .text(), .blob(), .arrayBuffer(), and .json() also.

lucacasonato commented 1 year ago

The background on this, is that formData and json are prevalent ways to parse the request body in HTTP servers that use fetch's Request and Response headers:


Deno.serve(async (req) => {
  const form = req.formData(); // you just created a DOS vuln!

  // do something with form

  return new Response();
});

jasnell commented 1 year ago

I'd be interested in this for both Node.js and Workers. I think it makes the most sense to add to all of the body mixin methods. Defining it as an options object makes perfect sense as I've also considered the possibility of adding a signal option to these to allow cancellation, e.g.

const ac = new Abort controller();
const form = await req.formData({ 
  maxSize: 10_000_000 /* 10MB */,
  signal: ac.signal
})

lucacasonato commented 1 year ago

Is there implementer interest from any browsers for this?

I have a draft spec PR here: #1600. If we can get some implementer interest, I can write WPTs - the implementation here should be relatively trivial.

cc @annevk @saschanaz @ricea

ricea commented 1 year ago

Interesting. My impression is that these features are not very useful on the browser-side. I wonder if ServiceWorkers would benefit? It's hard to commit resources to something that doesn't directly benefit our users.

annevk commented 1 year ago

Is there evidence of use in libraries or requests for this functionality on Stack Overflow?

@jakearchibald perhaps you can comment on SW utility?

saschanaz commented 1 year ago

Logically it makes sense for server use case, but I also think this is less useful for browsers. Can this be something like "a limit chosen by user agent" instead of a user given limit?

lucacasonato commented 1 year ago

I wonder if ServiceWorkers would benefit?

@ricea Possibly. I agree that in browsers the need for this is less evident, because the worst thing a DOS in a service worker can do is take down one origin for one user momentarily. On servers however, this DOS vector can take down one origin for all users.

Is there evidence of use in libraries or requests for this functionality on Stack Overflow?

@annevk I can not find requests on Stack Overflow, but there have definitely been CVEs related to request body parse handling in server-side JS runtimes and libraries. Fastify, a popular Node.js server framework, has an option to specify a maximum body size for requests. Similar options exist for express and many other frameworks.

Can this be something like "a limit chosen by user agent" instead of a user given limit?

@saschanaz We already imply this through this sentence in the infra spec:

[...] user agents may impose implementation-defined limits on otherwise unconstrained inputs. E.g., to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.

From https://infra.spec.whatwg.org/#algorithm-limits

The idea behind a specific limit option for us is that we'd like to set a reasonable default limit, but provide our users with an opt-out mechanism to specify a higher limit in case they need it. For us it is impractical to have a limit that is not in any way opt-out by the user.

There is a very strong need for this from both Deno and Cloudflare Workers, and it would be great if for parity's sake browsers matched behaviour here.

Hypothetically, if the implementation work in browsers is contributed by us, would you be more inclined to consider this? This is the path we are taking with #1346:

there is a need in server-side runtimes for this
we wrote the spec and tests
browsers were generally open to the idea, as long as they didn't have to put in the work to implement
as a consequence: Bloomberg sponsored Igalia to do the implementation work in browsers

Something similar was also done for Response.json (the static one). It was implemented in Chromium by me.

saschanaz commented 1 year ago

The idea behind a specific limit option for us is that we'd like to set a reasonable default limit, but provide our users with an opt-out mechanism to specify a higher limit in case they need it. For us it is impractical to have a limit that is not in any way opt-out by the user.

Oh, so this is to opt out instead of to opt in for stricter limit. Could such opt-out be in a command line option instead, though?

Edit: Also, can the user-agent defined limit be dynamic here (e.g. depends on the current system memory usage) so that users won't have to opt out because of too strict default limit?

jasnell commented 1 year ago

Could such opt-out be in a command line option instead, though?

Not in our case in Cloudflare workers. Users have no access to CLI options. An API option is ideal.

lucacasonato commented 1 year ago

Same for us - Deno Deploy does not have CLI flags. Even in the Deno CLI, we prefer letting users specify all configuration fully within code. This make application portability much simpler.

andreubotella commented 1 year ago

Since Infra specifies that user agents can have implementation-defined limits, I think the way to specify this would be to have this user-set limit as an extra limit on top of the implementation-defined ones. That way, browsers could implement it as a stricter limit, and server-side runtimes would be free to raise their regular limits to match the user-set limit.

saschanaz commented 1 year ago

(I'm not strictly against this, just exploring some options here which you probably already considered outside this thread)

Users have no access to CLI options. An API option is ideal.

But that API option can be in whatever form, like Deno.foo as in Deno, right? Of course that would be nonstandard and not pretty, but not sure this option is pretty either if it's only useful on servers.

jimmywarting commented 1 year ago

I'm not particularly +1 or -1 on this idea. but there are other way to battle this as well (to not hit OOM)

for starter

you can check the content-length header yourself already. and also reject request that do not have them.
- ofc not all requests have a content-length but then maybe perhaps you could do something cool/useful in combination of having a fetch observer that can track how much you have downloaded / upload. and if it have downloaded more than x bytes then you could abort the request / response with abortSignal or something. I have long been waiting for something like a progress monitoring solution that XHR have had for years but fetch don't
browser rarely never have any problem doing things like .formData() as files are often backed up by the filesystem anyways. I have proposed that decoding large file from .formData() also gets saved directly to the disc and handed back to you as a File backed up by the file system and isn't something that isn't kept in memory.
- same could be applied for Body.blob() to also be backed up by the file system.
You could also use other solutions other then the historical old .formData() solution like WebTransports instead.
You could send how large each individual file size is within a custom x- headers, so you would know how large each file is before you even begin to parse the payload with .formData()

The only value i see for this maxSize is within servers (not browsers) that's why i'm leaning more towards applying internal solutions that circumvents this OOM issues by doing the same thing browser dose to solve this problem without necessary adding new syntactic feature on top that isn't "really" necessary or can be solved in other ways without extra features/option that's only going to be useful for the server.

I know Bun.js use parts of webkits builtin web api's rather then trying to reimplement things. I know that they can represent a file backed up by the file system, i hear the verdic is that they do not support Body.formData() yet cuz HTMLFileElement made it a bit tricky. But if i could guess, then their solution would be to write files to the disc and hand you a native File handle back

Other cool (off-topic) thing would be if it where a 2 way communication where you could send file descriptions (extra streams) within a stream. but such things only exist in HTTP/2. then you could decide for yourself if you want to read file xyz and knowing beforehand how large the file is. but this would be something completely different. perhaps maybe a better solution is just to simply post one individual file per request and not use FormData at all if you intend to send/accept large files.

wattroll commented 11 months ago

Right now there is no way to call Body#formData on arbitrary user input without potentially causing an OOM

How about sinking a clone() of the request prior to calling formData()?

Deno.serve(async function(request) {
  try {
    const sink = new WritableStream(LimitedUnderlyingSink(10_000_000));
    await request.clone().body?.pipeTo(sink, { preventCancel: true });
    const _form = await request.formData();
    return new Response("OK\n");
  } catch (error) {
    return new Response(String(error) + '\n');
  }
});

function LimitedUnderlyingSink(maxByteLength: number): UnderlyingSink<ArrayBufferLike> {
  return {
    write(chunk) {
      maxByteLength -= chunk.byteLength;
      if (maxByteLength < 0) {
        throw new DOMException('Size limit exceeded', 'QuotaExceededError');
      }
    }
  }
}

matthieusieben commented 10 months ago

@wattroll, your approach is great but it does load everything in memory.

The following does essentially the same without draining the stream:


export async function fetchMaxSizeProcessor(
  response: Resonse,
  maxBytes: number
) {
  if (!response.body) return response

  const contentLength = response.headers.get('content-length')
  if (contentLength) {
    const length = Number(contentLength)
    if (length > maxBytes) {
      const err = createError(502, 'Response too large', {
        response,
      })
      await response.body.cancel(err)
      throw err
    }
  }

  const newBody = response.body.pipeThrough(
    new ByteCounterTransformStream(maxBytes)
  )
  const newResponse = new Response(newBody, response)

  // Some props do not get copied by the Response constructor (e.g. url)
    for (const key of ['url', 'redirected', 'type', 'statusText'] as const) {
      const value = response[key]
      if (value !== newResponse[key]) {
        Object.defineProperty(newResponse, key, {
          get: () => value,
          enumerable: true,
          configurable: true,
        })
      }
    }

  return newResponse
}

export class ByteCounterTransformStream extends TransformStream<
  Uint8Array,
  Uint8Array
> {
  #bytesRead = 0

  get bytesRead() {
    return this.#bytesRead
  }

  constructor(public readonly maxBytes?: number) {
    super({
      start: () => {},
      transform: (
        chunk: Uint8Array,
        controller: TransformStreamDefaultController<Uint8Array>
      ) => {
        this.#bytesRead += chunk.length
        if (maxBytes != null && maxBytes >= 0 && this.#bytesRead > maxBytes) {
          controller.error(createError(502, 'Response too large'))
          return
        }
        controller.enqueue(chunk)
      },
      flush: () => {},
    })
  }
}

Note: this can probably be made shorter

wattroll commented 10 months ago

@matthieusieben

your approach is great but it does load everything in memory

Good point. Everything in this case would be at most maxByteLength bytes that will get buffered into the tee()-ed stream during request.clone().body?.pipeTo(...) and be held until the await request.formData() completes.

Buffering up to the limit could be acceptable for the use-cases where the limit is low, but not for larger limits with final consumers that stream the chunks efficiently to somewhere.

I used that approach for a login adapter that accepts email/password form which I wanted to limit very tightly, so that my auth service isn't exposed to OOM with large and slow payloads.

I think it should be possible to sink the cloned request into byte counter and application sink in parallel, so that when the byte counter tips over it cancels the other one. I haven't tried that yet. I also don't know whether any server-side runtimes feature disk-backed Body#formData(). Per my understanding formData has to receive the body fully before returning (in order to provide the Blob#size property for each of the File in the form).

whatwg / fetch

Add size limiter to `Body#formData` method #1592