node-formidable / formidable

The most used, flexible, fast and streaming parser for multipart form data. Supports uploading to serverless environments, AWS S3, Azure, GCP or the filesystem. Used in production.
MIT License
7.01k stars 681 forks source link

Support parsing of multipart/byteranges #390

Open jcready opened 7 years ago

jcready commented 7 years ago

Currently formidable returns an empty body and a single item in the files object which is named 'null' and only contains the last part of the multipart byte-range body.

tunnckoCore commented 7 years ago

@felixge any help on that? Maybe https://github.com/jshttp/range-parser worth look?

felixge commented 7 years ago

I don't understand how this is supposed to work. @jcready can you provide an example request and what you'd expect formidable to do with it?

jcready commented 7 years ago

@felixge a multipart/byteranges response would look something like this (taken from RFC 7233):

HTTP/1.1 206 Partial Content
Date: Wed, 15 Nov 1995 06:25:24 GMT
Last-Modified: Wed, 15 Nov 1995 04:58:08 GMT
Content-Length: 1741
Content-Type: multipart/byteranges; boundary=THIS_STRING_SEPARATES

--THIS_STRING_SEPARATES
Content-Type: application/pdf
Content-Range: bytes 500-999/8000

...the first range...
--THIS_STRING_SEPARATES
Content-Type: application/pdf
Content-Range: bytes 7000-7999/8000

...the second range
--THIS_STRING_SEPARATES--

I imagine the parsed body would end up looking like this:

[
  {
    "headers": {
      "content-type": "application/pdf",
      "content-range": "bytes 500-999/8000"
    },
    "body": "...the first range..."
  },
  {
    "headers": {
      "content-type": "application/pdf",
      "content-range": "bytes 7000-7999/8000"
    },
    "body": "...the second range"
  }
]

Ideally this functionality would just be an additional parser like lib/byterange_parser.js where it would be used if the response's Content-Type started with multipart/byteranges. I've managed to get the desired output by overwriting formidable.IncomingForm.prototype.parse:

formidable.IncomingForm.prototype.parse = function parseByteRanges (res, done) {
  const parts = []
  let totalLength = 0
  res.on('error', done).on('data', (chunk) => {
    parts.push(chunk)
    totalLength += chunk.length
  }).on('end', () => {
    try {
      const boundary = res.headers['content-type'].match(/boundary=(.+)$/)[1]
      const body = Buffer.concat(parts, totalLength)
      done(null, parseMultipartBody(body.toString(), boundary))
    } catch (e) {
      done(e)
    }
  })
}

function parseMultipartBody (body, boundary) {
  return body.split('--' + boundary).reduce((memo, part) => {
    if (part && part !== '--') {
      const [ head, body ] = part.trim().split(/\r\n\r\n/g)
      memo.push({
        headers: head.split(/\r\n/).reduce((memo, header) => {
          const [ key, val ] = header.split(/:\s+/)
          memo[key.toLowerCase()] = val
          return memo
        }, {}),
        body: body
      })
    }
    return memo
  }, [])
}
tunnckoCore commented 7 years ago

@jcready hm, looks good. Can you try to PR and add some tests when you have time? We (the new maintainers) are here to help and review :)

felixge commented 7 years ago

@jcready makes sense. But I wonder if this should really be part of formidable. If yes, it should use the streaming multipart parser that's used for multipart/form-data and avoid unbound memory allocation. If nobody has time for a good implementation and tests, I'd rather not see this supported.