pqina / filepond

🌊 A flexible and fun JavaScript file upload library
https://pqina.nl/filepond
MIT License
15.16k stars 825 forks source link

How are files encoded in multipart form data? #344

Closed jeancochrane closed 5 years ago

jeancochrane commented 5 years ago

Hey all,

Thanks for your great work on FilePond! Really enjoying it so far.

One quick question about the library internals that seems like it might be more appropriate for GitHub than SO:

I'm trying to write a custom processor backend following the docs in order to upload files to Google Drive. However, I'm running into some strange behavior related to the file encoding that I don't quite understand.

How are files encoded in multipart form data? I'm parsing the form data in a Lambda function with busboy and raw-body like so:

const parse = (event) => {
  return new Promise((resolve, reject) => {
    const contentType = event.headers['Content-Type'] || event.headers['content-type'];
    const bb = new busboy({headers: {'content-type': contentType}});
    const filePromises = []
    let data = {fields: {}, files: {}}

    bb.on('file', (fieldname, file, filename, encoding, mimetype) => {
      data.files[fieldname] = {filename, encoding, mimetype}
      filePromises.push(
        getRawBody(file).then(rawFile => {data.files[fieldname].content = rawFile})
      )
    })
    .on('field', (fieldname, val) => {
      data.fields[fieldname] = val
    })
    .on('finish', () => {
      resolve(Promise.all(filePromises).then(() => data))
    })
    .on('error', err => {
      reject(err)
    })

    bb.end(event.body)
  })
}

Then, I pass the file data along to the Google Drive API using the buffer stored on data.files[fieldname].content. When I test this with a jpeg and retrieve the file from Google Drive, however, it appears to be encoded improperly, and I can't open it as an image. The same behavior happens if I write data.files[fieldname].content to the filesystem with fs.writeFileSync and then open it locally, so the encoding problem isn't happening with the Drive API.

How should I expect to handle the file data in the multipart form? Is it encoded in any particular way?

rikschennink commented 5 years ago

Hi, If you're not using the file encode plugin then files are posted to the server as any other form. What could be causing the issue is that FilePond posts both the file metadata and the file object itself using the same field name. So you'll probably have to check if you're dealing with the file or with the metadata JSON string.

jeancochrane commented 5 years ago

Thanks for the reply @rikschennink! I feel pretty confident that I'm dealing with the file, since the output of my processing with busboy and raw-body matches what I would expect:

{ fields: { filepond: '{"color":null}' },
  files:
    { filepond:
       { filename: 'image001.jpg',
         encoding: '7bit',
         mimetype: 'image/jpeg',
         content:
          <Buffer ef bf ... > } } }

But maybe I'm misinterpreting the format of the multipart/form-data request? Here's what the raw request object looks like:

------WebKitFormBoundarye37yFUAx7qVo4BAa
Content-Disposition: form-data; name="filepond"

{"color":null}
------WebKitFormBoundarye37yFUAx7qVo4BAa
Content-Disposition: form-data; name="filepond"; filename="image001.jpg"
Content-Type: image/jpeg

����JFIF``��C <more image data here>

The head of the image data body (����JFIF``��) matches what I see when I cat | head the image on my local filesystem, so it seems like the form data is being parsed correctly.

It's totally possible that the problem here is coming from my browser or from the way I'm handling the file stream with busboy and raw-body. I'll continue trying to debug from those angles but if anything jumps out to you here as unusual it'd be helpful!

rikschennink commented 5 years ago

No problem! :-) It's submitted as any other file, default multipart form submit, no shenanigans.

jeancochrane commented 5 years ago

Thanks for letting me leave this open while I investigated further @rikschennink! After a lot of debugging I'm pretty confident that the problem is actually that Lambda functions don't fully support file uploads for form data (for mysterious and AFAIK undocumented reasons). See: https://stackoverflow.com/a/42414142/7781189 There seem to be hints of ways to get around it, but I'm deploying my functions via Netlify Functions so customizing API Gateway configs isn't an option.

The recommended alternative path here seems to be to encode the image as base64 before shipping it to the server. It seems like I can accomplish this by customizing the server.process function in my FilePond config. I pulled together a quick spike and confirmed that this workflow would save the image correctly.

Before I close this, however, I wanted to ask: is there a recommended way of accomplishing a base64 encoding before shipping the file to a custom backend? filepond-plugin-file-encode seemed promising, but based on your comment here it seems like it's tightly integrated with the traditional HTML form submission flow. Since I'm handling the image upload with a custom backend API I couldn't figure out a way to get filepond-plugin-file-encode to expose the base64-encoded string in my server.process function. I'm comfortable writing my own custom base64 transformation into the server.process function, I just wanted to check first to see if there was a better-supported way of doing this.

rikschennink commented 5 years ago

Yes the file encode plugin is mainly there to allow for plain form submits. I think copy pasting the code inside the file encode plugin to your custom process method would be the best way to go.

I’m not sure if this is an option but maybe the remote service accepts array buffers?

jeancochrane commented 5 years ago

Great, thanks! I'll check whether Lamdba supports array buffers before copying the code from the file encode plugin.

Closing this since there was no underlying issue with the library. Thanks again for your support, I really appreciate it.