Open annevk opened 3 years ago
Isn't #3223 basically solved? The corresponding PR (#3276) was left untouched for years, and then I opened #6282 to replace it. It looks like @domenic edited the commit message of my PR to close the original PR, but he forgot to close the issue.
As for the boundary, every browser implement it ahead of time, and even the initial definition of multipart/*
in RFC1341 allows for a probabilistic choice of boundary. Requiring it to be computed ahead of time isn't ideal, but if it's necessary, the spec would have to require a lower bound on entropy to ensure the probability of a collision is negligible.
RFC2046 requires multipart boundaries to be 1 to 70 bytes in the range [0-9A-Za-z'()+,./:=?_ -]
, except that the final byte cannot be the space. But since the Content-Type
value generated by the form submission algorithm requires the boundary
parameter to not be a quoted string, the boundary should only contain bytes which are safe as a parameter value: [0-9A-Za-z'+._-]
Additionally, a comment in WebKit's (and Chromium's) implementation of the boundaries reads:
// The RFC 2046 spec says the alphanumeric characters plus the
// following characters are legal for boundaries: '()+_,-./:=?
// However the following characters, though legal, cause some sites
// to fail: (),./:=+
Assuming that is still the case, the remaining safe bytes would be [0-9A-Za-z'_-]
If we want to take cues from the implementations, Firefox's boundary string contains a constant prefix of 27 bytes (all hyphens) plus a random part of between 3 and 30 ASCII digits, whose entropy I don't quite know how to calculate but it's probably close to but lower than 96 bits. Webkit and Chromium's boundary string has a constant prefix of 22 bytes (hyphens and ASCII alpha) plus a random part of 16 ASCII alphanumeric bytes, with 95 bits of entropy.
If I'm doing my math right, with a fixed length l
and an entropy h
, the expected length of a form payload before the boundary occurs in it is (l * (2^h - 1))/2 bytes, which for the boundary strings generated by browsers is over a yottabyte.
For the record, I'm working on defining multipart/form-data
in https://github.com/andreubotella/multipart-form-data.
3223 is part of this, but to properly integrate with Fetch we need more. In particular, I think we want a serialization operation that returns a tuple. The tuple contains the boundary and a list of which each item is either a byte sequence or a
Blob
. That allows Fetch to compute the total size (go through the list, and increment by either byte sequence's length or blob's size) and allows it to enqueue chunks into a stream lazily without blocking I/O. It's not really possible to pretend synchronous I/O and allow user agents optimize as the I/O might fail, whereas obtaining the size should not fail (thanks to @mkruisselbrink for pointing that out).We should also point out that this is a potentially lossy format as the boundary needs to be necessarily computed ahead-of-time without knowing the contents of the blobs. There is no way to avoid this as the boundary is part of the headers and exposed through something like
new Response(formData).headers.get("content-type")
. I suppose it was possible to avoid this before there was an API if you did not care about streaming, but here we are.There's a separate question of where we want to define this format. At the moment it's mostly in HTML but
FormData
is in XMLHttpRequest. Status quo is fine with me.cc @andreubotella