sveltejs / kit

web development, streamlined
https://svelte.dev/docs/kit
MIT License
18.73k stars 1.94k forks source link

Handle file uploads #70

Closed Rich-Harris closed 2 years ago

Rich-Harris commented 4 years ago

Currently, the body parser bails on multipart forms that include files

yanick commented 2 years ago

The secret lies in the docs of https://github.com/sveltejs/kit/tree/master/packages/adapter-node

I've updated my example. The key is the file src/server.mjs, which is the entry point for the prod server.

vonadz commented 2 years ago

The secret lies in the docs of https://github.com/sveltejs/kit/tree/master/packages/adapter-node

I've updated my example. The key is the file src/server.mjs, which is the entry point for the prod server.

Oh yeah the whole middleware section is a new update since I last tried implementing this, which substantially simplifies things.

jack-y commented 2 years ago

@yanick Great job! Thank you so much. I've updated my repo too.

FYI, I tried with fastify instead of polka, but without success.
As the doc says: "The adapter exports a middleware (req, res, next) => {} that's compatible with Express / Connect / Polka."

m4tr1k commented 2 years ago

Hi everyone!

So I came across this problem as well, since I needed a way to have access to the file on sveltekit endpoint, but I didn't want to use the above solutions using adapter-node because of my backend implementation. So I decided to rewrite the get-multipart function to support file uploading. I used a dependency (parse-multipart-data) to parse the multipart form data request, and then I just had to present that information in a different way.

That's my implementation:

import multipart from 'parse-multipart-data';

(...)

function parse_body(raw, headers) {
             (...)
        case 'multipart/form-data': {
            const boundary = directives.find((directive) => directive.startsWith('boundary='));
            if (!boundary) throw new Error('Missing boundary');
                         // I added the raw body request to use it in parse-multipart-data
            return get_multipart(text(), boundary.slice('boundary='.length), raw);
        }
}

(...)

function get_multipart(text, boundary, raw) {
    let data = {};
    const parts = text.split(`--${boundary}`);

    if (parts[0] !== '' || parts[parts.length - 1].trim() !== '--') {
        throw new Error('Malformed form data');
    }

    let names = [];
    parts.slice(1, -1).forEach((part) => {
        const match = /\s*([\s\S]+?)\r\n\r\n([\s\S]*)\s*/.exec(part);
        if (!match) {
            throw new Error('Malformed form data');
        }
        const raw_headers = match[1];
        raw_headers.split('\r\n').forEach((str) => {
            const [raw_header, ...raw_directives] = str.split('; ');
            let [name, value] = raw_header.split(': ');

            name = name.toLowerCase();

            /** @type {Record<string, string>} */
            const directives = {};
            raw_directives.forEach((raw_directive) => {
                const [name, value] = raw_directive.split('=');
                directives[name] = JSON.parse(value);
            });

            if (name === 'content-disposition') {
                if (value !== 'form-data') throw new Error('Malformed form data');

                if (directives.name) {
                    names = [...names, directives.name];
                }
            }
        });
    });

    const bodyParts = multipart.parse(raw, boundary);

    for(let i = 0; i < bodyParts.length; i++) {
        const bodyPart = bodyParts[i];
        if(!bodyPart.type){
            data[names[i]] = bodyPart.data.toString();
        } else {
            data[names[i]] = {
                data: bodyPart.data,
                type: bodyPart.type
            };
        }
    }

    return data;
}

Please keep in mind that I'm aware that this may not be the smartest and most effective way to do this (because you can see that I did not erase a part of the code and probably there are more elegant ways to do this), but as a small fix until the full implementation is fully done I think it is a pretty good solution. I would like to know your opinion on this as well.

Thanks 😄

KrustyC commented 2 years ago

I am running into the same issue and I am just wondering what's the latest on this? Is this going to be part of release 1.0? If not, is there any estimate about when this will be made available? I think it's an important feature and a very common use case (as proven from all the comments above). I would be happy to help if possible, although I have never worked with the svelte-kit codebase before

m4tr1k commented 2 years ago

@KrustyC hey, check out my solution right above your comment. I developed a way to be able to access the files on Svelte endpoints. Hope it helps!

Rich-Harris commented 2 years ago

RFC

Prompted by #3086 (thanks @pixelmund), I've spent some time thinking about how we should rethink request body handling to enable both file uploads and other large bodies. I'd be grateful for feedback on the following:

This would, of course, be a breaking change. We'd also need to figure out what to do about rawBody, which we currently expose for use cases like #831. I think the logical choice here would be to make it available for text/JSON bodies, but otherwise have it be null.

thgh commented 2 years ago

How about using the standard Request object?

// src/routes/multiply.js
- export async function post({ body }) {
- const data = await body.toFormData();
+ export async function post(req: Request) {
+ const data = await req.formData();
const a = +data.get('a');
const b = +data.get('b');

  return {
    status: 200,
    body: a * b
  };
}
maxicarlos08 commented 2 years ago

That doesn't exist in Nodejs as far as I know...

thgh commented 2 years ago

That doesn't exist in Nodejs as far as I know...

The node adapter depends on node-fetch. And node-fetch seems to have the formData method.

So this should work:

import { Request } from 'node-fetch'

const data: FormData = await new Request(url, { headers, body }).formData();
const file: File = data.get('file-input')
console.log(file.name)
console.log(file.size)
console.log(file.type)
Rich-Harris commented 2 years ago

@thgh I've thought about it. There is something nice about aligning with the platform. But it risks being a straitjacket, and I think we can provide better ergonomics when it comes to streaming. Glossing over the fact that params and locals need to be present in the argument signature, this is how you'd process an octet stream using Request...

export async function post(req) {
  const reader = req.body.getReader();

  let chunk;
  while (chunk = await reader.read()) {
    if (chunk.done) {
      return {
        status: 204
      };
    }

    await do_something_with(chunk.value);
  }
}

...versus the proposal:

export async function post({ body }) {
  for await (const chunk of body) {
    await do_something_with(chunk.value);
  }

  return {
    status: 204
  };
}

For forms with files (where we want to handle things in streaming fashion, i.e. formData() is a non-starter), it gets much more complicated; we'd need to provide a helper of some kind that would turn Request into an iterable-of-iterables as above (or a stream-of-streams, if we wanted to do things that way, but I think this is a case where language > platform).

Even for trivial cases, it's an extra hoop to jump through — if the content-type is application/json then calling await req.json() feels like busywork.

That doesn't exist in Nodejs as far as I know...

That's not a problem, it's provided in SvelteKit (either by the environment, if you're using Deno etc, or the adapter if you're using Node). But it is true that we'd be creating a Request object ourselves, which feels like unnecessary overhead.

Rich-Harris commented 2 years ago

So this should work:

import { Request } from 'node-fetch'

const data: FormData = await new Request(url, { headers, body }).formData();
const file: File = data.get('file-input')
console.log(file.name)
console.log(file.size)
console.log(file.type)

Unfortunately this assumes that the body is already buffered

darbymanning commented 2 years ago

Sorry if a tad off-topic - but whilst processing files in requests for endpoints isn’t available yet, is it possible to add a platform specific serverless function, that’s handled by the provider? eg. A platform specific endpoint not managed through SvelteKit (in my case a Vercel Serverless function). I haven’t checked but I assume they can support formdata being provided. I love the platform agnostic approach SK has, but as a temporary solution perhaps that could work until we have support in SK endpoints.

Alternative to this I guess is making a whole separate app/site in Vercel which is ‘vanilla’, purely to create the serverless function. None are great solutions, but I do kinda need the functionality!

yanick commented 2 years ago

Just to drop info to prevent some teeth-gnashing: the latest node-adapter versions renamed the produced middleware.js to handler.js. I'll try to update my example repo to reflect that soon. (@jack-y, that might be of interest to you too)

:heart:

jack-y commented 2 years ago

@yanick, thank you so much for letting me know! My example repo is also updated.

jack-y commented 2 years ago

I just created an example repo using FileReader and an endpoint. No FormData.
Beware: this IS NOT a solution for the initial issue. It's just a workaround that might help.

benmccann commented 2 years ago

@Rich-Harris your suggestions look pretty reasonable to me. The only question in my mind would be making sure we nail down the streaming API, which this seems to build on top of. There were some interesting use cases in that thread, so we might want to check if we can handle cancellation or jump around a video stream. The streaming portion is not my area of expertise, but the changes to the form API here look good to me.

quentincaffeino commented 2 years ago

@Rich-Harris

if the request's content-type is text/* or application/json, keep the existing behaviour (i.e. request.body contains text or a parsed JSON object) if the request's content-type is application/x-www-form-urlencoded or multipart/form-data, request.body is an async iterable, each of whose values represents one of the form values and is also an async iterable, whose values correspond to the actual data:

I don't really like having it hard-coded like that.

For example from personal experience with JSON I see two reasons why streaming might be useful:

  1. Processing large JSON inputs, this just isn't possible with regular JSON.parse or res.json. Yes having big jsons is very niche, but asynchronous parsing could speedup some apps.
  2. Semantic Web and JsonLD. This one is even more niche. One can parse JsonLD string into JS object with proxies of linked data. Problem is not every client sets the correct header. Since it's still a json many just use regular application/json instead of application/ld+json.

There most definitely other cases that I am missing. But generally speaking my point is that streaming any of the types of data could be useful.

My proposal would be and I believe I haven't seen this yet: is to leave it as is by default and export config object (or separate properties) from the endpoint, similar to how it's done on .svelte files (eg.: <svelte:options accessors={true}/>).

I imagine it looking something like this:

export const postConfig = {
    body: "default" | "raw" | "stream"
};

Reason I believe it is better to use an object is because this would allow other extensions to the config in the future (middlewares?).

Hope my idea isn't too crazy.

Edit:

Did a little research, turns out my proposal is similar to what Nextjs has. It allows to opt-out from body parsing: https://nextjs.org/docs/api-routes/api-middlewares#custom-config

Rich-Harris commented 2 years ago

@benmccann

There were some interesting use cases in that thread, so we might want to check if we can handle cancellation or jump around a video stream.

1563 is about responses rather than requests, so the issues are related but different. Reading through it, I am warming to the idea of supporting ReadableStream responses directly. Firstly, as the issue points out, it gives us an avenue for cancellation:

return {
  status: 200,
  body: new ReadableStream({
    start(controller) {
      // streaming logic goes here
    },
    cancel() {
      // cleanup goes here
    }
  })
};

Achieving the same thing with async iterables would involve some contortions.

Secondly, with a minor change — allowing headers to be a Headers object rather than a POJO — we would get support for this for free, which would make it incredibly easy to proxy to external sites and APIs:

export const get = ({ url }) => fetch('https://en.wikipedia.org' + url.pathname);

I think this fetch would be cancellable by the underlying platform (res.body.cancel()) even if we don't pass in an AbortSignal? (I'm not particularly well versed on this.)

Thirdly, it means less work, since we wouldn't need to convert the iterable to a ReadableStream in order to pass the response to platforms that expect Response objects, like Cloudflare Workers and Deno. (This might even save SvelteKit users money, in some cases.)

Of course, this involves a change to the contract between adapters and Kit. If app.render returns a Response, we can delete some code from adapter-cloudflare, but Node- and lambda-shaped adapters would need to do some more work to pipe a ReadableStream to a Node res or buffer it for a lambda return value (Kit can expose helpers for doing that), and adapter-node would need to listen for req.on('close') to cancel the body.

Given all the above it might seem odd to still favour async iterables for request bodies, but I do — I think the ergonomics for consuming versus producing streams are sufficiently different, and cancellation is easier to model (if the requester cancels for await (const chunk of body) can throw; the endpoint could proactively cancel the request by returning).


Re video streams — the endpoint will always need to handle Range headers, Kit can't do that for you.


@quentincaffeino

My proposal would be and I believe I haven't seen this yet: is to leave it as is by default and export config object (or separate properties) from the endpoint

I like this. I think that's probably a separate conversation as the configuration could take many forms, and it's something that could be implemented after we've nailed the basics of streamed requests/responses.

Rich-Harris commented 2 years ago

It's now possible to upload files, as long as they're not large: #3384.

Handling streams is still a TODO, but there's an issue for it. Closing this in favour of #3419

maxicarlos08 commented 2 years ago

Could we make use of the Request and Response globals that will soon be available in node https://github.com/nodejs/node/pull/41749?