reactphp / http

Event-driven, streaming HTTP client and server implementation for ReactPHP.
https://reactphp.org/http/
MIT License
740 stars 142 forks source link

how to save an upload? #428

Closed scottw-finao closed 2 years ago

scottw-finao commented 2 years ago

Scouring through the documents and videos, I can't seem to find a simple example of how to save a file upload. I've tried mulitple examples I see and none worked. I took some guesses and got one to save using getContents() but it doesn't seem to work on larger files.

I see other sources that say to save stream data you use stream_copy_to_stream but when i take UploadedFile()->getStream() as the input, it says it wants a resource, not an instance.

I just need to do a simple file upload as part of a multipart form data rest interface. I can the array of UploadedFile() instances with getUploadedFiles() but I can't seem to find any basic instructions for turning that into a file.

I tried yet another example that uses a promise, but then I'm not sure what the proper way is to get the UploadedFile() data into the sub-function. Can someone please give me a simple example on how I'm supposed to save the file data please?!

clue commented 2 years ago

@scottw-finao Thank you for this excellent question! This project does indeed support file uploads just fine (but there are a number of caveats, see below).

Here's the gist how to save uploaded files:

$http = new React\Http\HttpServer(function (Psr\Http\Message\ServerRequestInterface $request) {
    $files = $request->getUploadedFiles();
    if (!isset($files['avatar'])) {
        return React\Http\Message\Response::plaintext("No file given\n");
    }

    $avatar = $files['avatar'];
    assert($avatar instanceof Psr\Http\Message\UploadedFileInterface);

    if ($avatar->getError() !== UPLOAD_ERR_OK) {
        return React\Http\Message\Response::plaintext("Upload error\n");
    }

    // unsafe, for demonstration purpose only: no filename checks, may overwrite existing file
    $name = $avatar->getClientFilename();

    // blocking, for demonstration purpose only: filesystem may block, loads entire file into memory
    file_put_contents(__DIR__ . '/' . $name, (string) $avatar->getStream());

    return React\Http\Message\Response::plaintext("Uploaded $name\n");
});

This example should work just fine, but loads the entire file into memory before writing it to the filesystem, see also https://github.com/reactphp/http#request-body and https://github.com/reactphp/http/pull/252. As per the linked documentation, we can use the getStream() method, but not the moveTo() method for reasons given below.

In particular, note that the filesystem is inherently blocking, so you may wish to use a different approach if you have high concurrency. For example, you may instead send this off to some external service (think S3, FTP, SSH, etc.) or use our (experimental) filesystem adapter: https://github.com/reactphp/filesystem

On top of this, note that the HttpServer has a default request size limit of just 64 KiB. This means that if you're trying to upload larger files, you have to configure the server to support larger files. See https://github.com/reactphp/http#httpserver for details and also https://github.com/reactphp/http/issues/412 for motivation.

The problem we're seeing here is that the filesystem is inherently blocking, but ReactPHP is entirely non-blocking. This means we try hard to avoid hitting the filesystem and as a consequence have to store everything in memory. This works perfectly fine for smaller requests, but it's easy to see why you wouldn't want to store multiple gigabytes of data in memory for large uploads. To avoid this, ReactPHP provides the option to use streaming request bodies as per https://github.com/reactphp/http#streaming-incoming-request. While this would allow you to accept arbitrary file uploads, it's arguably also somewhat harder to use.

Together, this means handling file uploads is surprisingly hard.

I've looked into this a couple of times already and hope to be able to make this easier in the future, but the main problem that files have to be stored "somewhere" remains for now. I hope to provide some kind of configuration option for this in the future, but no promises at this point.

I believe this has been answered, so I'm closing this for now. Please come back with more details if this problem persists and we can always reopen and/or keep discussing this :+1:

bartvanhoutte commented 2 years ago

@clue I've also been looking into handling file uploads through streaming for the last couple of days but I must admit it is a very hard problem to solve.

Mainly the biggest issue is handling the stream flow while doing any asynchronous I/O. At a certain point while handling the request you might want to check the filesystem or database to see if the file already exists, which causes issues with the stream. I've tried buffering the data emitted by the stream and handling the buffer + remaining data emitted by the stream after doing non-blocking I/O but this is really hard and probably unreliable. The size of the buffer is correlated to the duration of any asynchronous I/O you'd like to do. Hence in the worst case scenario you'll end up with having the complete file in memory at which point streaming becomes completely useless.

The least worse solution I've come up with is to store the file on a temporary filesystem that supports ReadableStreamInterface before doing any processing. When you've done the necessary I/O you can either move (stream) the file from the temporary filesystem to the final destination or delete the file when you shouldn't have saved it in the first place.

Bear in mind that this solution:

clue commented 2 years ago

Agree with your analysis and the fact that this is both surprisingly complex and that it most likely requires some kind of filesystem integration for temporary files. I've started looking into this for some private projects a while ago, a more generic solution is definitely on the roadmap, but requires non-trivial effort.

bartvanhoutte commented 2 years ago

I have been looking into this a bit more, there's still some hurdles left but I think it should be doable. Just a couple of things to think about, YMMV.

The main issue is avoiding any I/O while receiving incoming data:

clue commented 2 years ago

The filesystem is inherently blocking while ReactPHP is entirely non-blocking, so any solution we're going to come up with is going to involve some non-trivial effort. Most commonly, blocking filesystem calls can be avoided by using thread pools (not available in PHP without custom extensions and a ZTS build which is rare) or process pools (available out of the box, but incurs significant overhead).

Technically, any filesystem call is blocking. But in practice, today's filesystem implementation and underlying hardware devices tend to be fast. Realistically, this means we may get away with using blocking filesystem calls. This can be especially true if we're only dealing with files in smaller chunks, for instance for our socket communication we usually read a few kilobytes at once only and most filesystems would outperform any network connection.

PHP's tmpfile() or tempnam() can be used to create temporary files with unique names, so no need to perform costly stat calls with file_exists() and friends. Filesystem calls may indeed be blocking, but from a practical perspective they might be fast enough to be considered non-blocking for many use cases.

bartvanhoutte commented 2 years ago

I should have added in my previous reply that for starters I'd got for an implementation using libuv for now 😅

The filesystem is inherently blocking while ReactPHP is entirely non-blocking, so any solution we're going to come up with is going to involve some non-trivial effort. Most commonly, blocking filesystem calls can be avoided by using thread pools (not available in PHP without custom extensions and a ZTS build which is rare) or process pools (available out of the box, but incurs significant overhead).

Does this include filesystem access using libuv from the perspective of the event loop? I'm not sure what to make of the following statement in the libuv documentation found at http://docs.libuv.org/en/v1.x/design.html#file-i-o. The prototyping I've done described in the previous comment was with libuv.

Am I correct in saying we either use threads/process pools in native PHP or use I/O functions with libuv? Like it's done in react/filesystem 0.2.x?

Unlike network I/O, there are no platform-specific file I/O primitives libuv could rely on, so the current approach is to run blocking file I/O operations in a thread pool.

... and most filesystems would outperform any network connection.

Makes sense but i guess YMMV when buffering requests on an IO throttled drive.

PHP's tmpfile() or tempnam() can be used to create temporary files with unique names, so no need to perform costly stat calls with file_exists() and friends. Filesystem calls may indeed be blocking, but from a practical perspective they might be fast enough to be considered non-blocking for many use cases.

Would definitely work with a thread/process pool and would also makes things less complicated 👍.

bartvanhoutte commented 1 year ago

The filesystem is inherently blocking while ReactPHP is entirely non-blocking, so any solution we're going to come up with is going to involve some non-trivial effort. Most commonly, blocking filesystem calls can be avoided by using thread pools (not available in PHP without custom extensions and a ZTS build which is rare) or process pools (available out of the box, but incurs significant overhead).

Technically, any filesystem call is blocking. But in practice, today's filesystem implementation and underlying hardware devices tend to be fast. Realistically, this means we may get away with using blocking filesystem calls. This can be especially true if we're only dealing with files in smaller chunks, for instance for our socket communication we usually read a few kilobytes at once only and most filesystems would outperform any network connection.

PHP's tmpfile() or tempnam() can be used to create temporary files with unique names, so no need to perform costly stat calls with file_exists() and friends. Filesystem calls may indeed be blocking, but from a practical perspective they might be fast enough to be considered non-blocking for many use cases.

As libuv is now able to use io_uring (https://github.com/libuv/libuv/commit/d2c31f429b87b476a7f1344d145dad4752a406d4) this may become a lot more viable.