wintercg / fetch

WinterCG changes to the WHATWG Fetch standard
Other
25 stars 0 forks source link

Work on standardizing multipart/form-data parsing (for `Request.prototype.formData`) #10

Open andreubotella opened 2 years ago

andreubotella commented 2 years ago

The fetch spec includes APIs for interacting with form submissions. For example, there is the Request and Response constructors accepting URLSearchParams and FormData objects as the request/response body, which is generally useful and is expected to be part of the common minimum API.

However, the fetch spec also defines the formData() method of the Body interface mixin, which is included in Request and Response. This method parses the HTTP body as a form submission enctype (either application/x-www-form-urlencoded or multipart/form-data) and returns a FormData object. Since form submission bodies only generally make sense as requests, and it's rarely useful to parse a request body from an HTTP client, it wouldn't make much sense to include this method as part of the common minimum API – but it is certainly useful for fetch-based HTTP server APIs, as Deno and CFW have.

For multipart/form-data parsing, however, this method leaves things almost completely unspecified. While there is a formal definition of this format (in RFC7578, which relies on the multipart definitions in RFC2046), it is in the form of an ABNF grammar rather than a parsing algorithms, and so different implementations differ in how they parse some input.

What's more, browsers have not always escaped field names and filenames in multipart/form-data payloads in the same way. For example, until last year Firefox escaped double quotes by prepending a backslash, and newlines by turning them into spaces; while Chromium and Webkit used percent-encoding. And while this percent-encoding behavior was added to the HTML spec (whatwg/html#6282), and FIrefox's behavior fixed in turn, no implementation of the parsing that I'm aware of (including Chromium and Webkit!) decode the percent-encoding escapes:

const original = new FormData();
original.set('a"b', "");
original.set('c"d', new File([], 'e"f'));
log(original);  // a"b c"d e"f

const parsed = await new Response(original).formData();
log(parsed);  // a%22b c%22d e%22f
// (In CFW it's a%22b c%22d undefined, because it seems like files are not
// distinguished from non-file values when parsing.)

function log(formdata) {
  // FormData is pair-iterable.
  const entries = [...formdata];
  const firstEntryName = entries[0][0];
  const secondEntryName = entries[1][0];
  const secondEntryFilename = entries[1][1].name;
  console.log(firstEntryName, secondEntryName, secondEntryFilename);
}

For browsers, specifying multipart/form-data parsing is not a big priority, since there are not many use cases for them, and the formData() method has been broken for 8 years or so. But for WinterCG runtimes with a fetch-based HTTP server API, being able to parse form submissions with the existing fetch API is crucial, and being able to accurately parse the form submissions that all browser engines are currently submitting is a large part of that. So this seems like a very interesting issue to tackle as part of the WinterCG project.

cyco130 commented 2 years ago

I agree this is very important.

But also the formData API is not very suitable for server-side usage. It requires buffering the whole request. I think a streaming API for parsing multipart requests in general (and multipart/form-data in particular) is necessary for any kind of real life usage of the fetch API on the server.

andreubotella commented 2 years ago

I agree this is very important.

But also the formData API is not very suitable for server-side usage. It requires buffering the whole request. I think a streaming API for parsing multipart requests in general (and multipart/form-data in particular) is necessary for any kind of real life usage of the fetch API on the server.

Certainly. @lucacasonato had some proposals about this. But they would still involve defining a multipart/form-data parsing algorithm, and that is the main bulk of the work I will be setting out to do when I get started on this.