Open lucacasonato opened 1 year ago
The background on this, is that formData
and json
are prevalent ways to parse the request body in HTTP servers that use fetch's Request
and Response
headers:
Deno.serve(async (req) => {
const form = req.formData(); // you just created a DOS vuln!
// do something with form
return new Response();
});
I'd be interested in this for both Node.js and Workers. I think it makes the most sense to add to all of the body mixin methods. Defining it as an options object makes perfect sense as I've also considered the possibility of adding a signal option to these to allow cancellation, e.g.
const ac = new Abort controller();
const form = await req.formData({
maxSize: 10_000_000 /* 10MB */,
signal: ac.signal
})
Is there implementer interest from any browsers for this?
I have a draft spec PR here: #1600. If we can get some implementer interest, I can write WPTs - the implementation here should be relatively trivial.
cc @annevk @saschanaz @ricea
Interesting. My impression is that these features are not very useful on the browser-side. I wonder if ServiceWorkers would benefit? It's hard to commit resources to something that doesn't directly benefit our users.
Is there evidence of use in libraries or requests for this functionality on Stack Overflow?
@jakearchibald perhaps you can comment on SW utility?
Logically it makes sense for server use case, but I also think this is less useful for browsers. Can this be something like "a limit chosen by user agent" instead of a user given limit?
I wonder if ServiceWorkers would benefit?
@ricea Possibly. I agree that in browsers the need for this is less evident, because the worst thing a DOS in a service worker can do is take down one origin for one user momentarily. On servers however, this DOS vector can take down one origin for all users.
Is there evidence of use in libraries or requests for this functionality on Stack Overflow?
@annevk I can not find requests on Stack Overflow, but there have definitely been CVEs related to request body parse handling in server-side JS runtimes and libraries. Fastify, a popular Node.js server framework, has an option to specify a maximum body size for requests. Similar options exist for express and many other frameworks.
Can this be something like "a limit chosen by user agent" instead of a user given limit?
@saschanaz We already imply this through this sentence in the infra spec:
[...] user agents may impose implementation-defined limits on otherwise unconstrained inputs. E.g., to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.
The idea behind a specific limit option for us is that we'd like to set a reasonable default limit, but provide our users with an opt-out mechanism to specify a higher limit in case they need it. For us it is impractical to have a limit that is not in any way opt-out by the user.
There is a very strong need for this from both Deno and Cloudflare Workers, and it would be great if for parity's sake browsers matched behaviour here.
Hypothetically, if the implementation work in browsers is contributed by us, would you be more inclined to consider this? This is the path we are taking with #1346:
Something similar was also done for Response.json
(the static one). It was implemented in Chromium by me.
The idea behind a specific limit option for us is that we'd like to set a reasonable default limit, but provide our users with an opt-out mechanism to specify a higher limit in case they need it. For us it is impractical to have a limit that is not in any way opt-out by the user.
Oh, so this is to opt out instead of to opt in for stricter limit. Could such opt-out be in a command line option instead, though?
Edit: Also, can the user-agent defined limit be dynamic here (e.g. depends on the current system memory usage) so that users won't have to opt out because of too strict default limit?
Could such opt-out be in a command line option instead, though?
Not in our case in Cloudflare workers. Users have no access to CLI options. An API option is ideal.
Same for us - Deno Deploy does not have CLI flags. Even in the Deno CLI, we prefer letting users specify all configuration fully within code. This make application portability much simpler.
Since Infra specifies that user agents can have implementation-defined limits, I think the way to specify this would be to have this user-set limit as an extra limit on top of the implementation-defined ones. That way, browsers could implement it as a stricter limit, and server-side runtimes would be free to raise their regular limits to match the user-set limit.
(I'm not strictly against this, just exploring some options here which you probably already considered outside this thread)
Users have no access to CLI options. An API option is ideal.
But that API option can be in whatever form, like Deno.foo
as in Deno, right? Of course that would be nonstandard and not pretty, but not sure this option is pretty either if it's only useful on servers.
I'm not particularly +1 or -1 on this idea. but there are other way to battle this as well (to not hit OOM)
for starter
content-length
header yourself already. and also reject request that do not have them.
content-length
but then maybe perhaps you could do something cool/useful in combination of having a fetch observer that can track how much you have downloaded / upload. and if it have downloaded more than x
bytes then you could abort the request / response with abortSignal
or something. I have long been waiting for something like a progress monitoring solution that XHR have had for years but fetch don't.formData()
as files are often backed up by the filesystem anyways. I have proposed that decoding large file from .formData()
also gets saved directly to the disc and handed back to you as a File
backed up by the file system and isn't something that isn't kept in memory.
Body.blob()
to also be backed up by the file system..formData()
solution like WebTransports
instead.x-
headers, so you would know how large each file is before you even begin to parse the payload with .formData()
The only value i see for this maxSize
is within servers (not browsers) that's why i'm leaning more towards applying internal solutions that circumvents this OOM issues by doing the same thing browser dose to solve this problem without necessary adding new syntactic feature on top that isn't "really" necessary or can be solved in other ways without extra features/option that's only going to be useful for the server.
I know Bun.js use parts of webkits builtin web api's rather then trying to reimplement things.
I know that they can represent a file backed up by the file system, i hear the verdic is that they do not support Body.formData()
yet cuz HTMLFileElement made it a bit tricky. But if i could guess, then their solution would be to write files to the disc and hand you a native File handle back
Other cool (off-topic) thing would be if it where a 2 way communication where you could send file descriptions (extra streams) within a stream. but such things only exist in HTTP/2. then you could decide for yourself if you want to read file xyz and knowing beforehand how large the file is. but this would be something completely different.
perhaps maybe a better solution is just to simply post one individual file per request and not use FormData
at all if you intend to send/accept large files.
Right now there is no way to call Body#formData on arbitrary user input without potentially causing an OOM
How about sinking a clone()
of the request prior to calling formData()
?
Deno.serve(async function(request) {
try {
const sink = new WritableStream(LimitedUnderlyingSink(10_000_000));
await request.clone().body?.pipeTo(sink, { preventCancel: true });
const _form = await request.formData();
return new Response("OK\n");
} catch (error) {
return new Response(String(error) + '\n');
}
});
function LimitedUnderlyingSink(maxByteLength: number): UnderlyingSink<ArrayBufferLike> {
return {
write(chunk) {
maxByteLength -= chunk.byteLength;
if (maxByteLength < 0) {
throw new DOMException('Size limit exceeded', 'QuotaExceededError');
}
}
}
}
@wattroll, your approach is great but it does load everything in memory.
The following does essentially the same without draining the stream:
export async function fetchMaxSizeProcessor(
response: Resonse,
maxBytes: number
) {
if (!response.body) return response
const contentLength = response.headers.get('content-length')
if (contentLength) {
const length = Number(contentLength)
if (length > maxBytes) {
const err = createError(502, 'Response too large', {
response,
})
await response.body.cancel(err)
throw err
}
}
const newBody = response.body.pipeThrough(
new ByteCounterTransformStream(maxBytes)
)
const newResponse = new Response(newBody, response)
// Some props do not get copied by the Response constructor (e.g. url)
for (const key of ['url', 'redirected', 'type', 'statusText'] as const) {
const value = response[key]
if (value !== newResponse[key]) {
Object.defineProperty(newResponse, key, {
get: () => value,
enumerable: true,
configurable: true,
})
}
}
return newResponse
}
export class ByteCounterTransformStream extends TransformStream<
Uint8Array,
Uint8Array
> {
#bytesRead = 0
get bytesRead() {
return this.#bytesRead
}
constructor(public readonly maxBytes?: number) {
super({
start: () => {},
transform: (
chunk: Uint8Array,
controller: TransformStreamDefaultController<Uint8Array>
) => {
this.#bytesRead += chunk.length
if (maxBytes != null && maxBytes >= 0 && this.#bytesRead > maxBytes) {
controller.error(createError(502, 'Response too large'))
return
}
controller.enqueue(chunk)
},
flush: () => {},
})
}
}
Note: this can probably be made shorter
@matthieusieben
your approach is great but it does load everything in memory
Good point. Everything in this case would be at most maxByteLength
bytes that will get buffered into the tee()
-ed stream during request.clone().body?.pipeTo(...)
and be held until the await request.formData()
completes.
Buffering up to the limit could be acceptable for the use-cases where the limit is low, but not for larger limits with final consumers that stream the chunks efficiently to somewhere.
I used that approach for a login adapter that accepts email/password form which I wanted to limit very tightly, so that my auth service isn't exposed to OOM with large and slow payloads.
I think it should be possible to sink the cloned request into byte counter and application sink in parallel, so that when the byte counter tips over it cancels the other one. I haven't tried that yet. I also don't know whether any server-side runtimes feature disk-backed Body#formData()
. Per my understanding formData
has to receive the body fully before returning (in order to provide the Blob#size
property for each of the File
in the form).
Right now there is no way to call
Body#formData
on arbitrary user input without potentially causing an OOM. It would be great if we could constrain theformData
parser so it stops parsing if a body exceeds a certain size.Example API:
If the body size exceeds the max size, a
QuotaExceededError
DOMException
should be thrown.This might make sense for
.text()
,.blob()
,.arrayBuffer()
, and.json()
also.