reactive-python / reactpy

It's React, but in Python
https://reactpy.dev
MIT License
7.89k stars 318 forks source link

Serialize File Content #574

Open rmorshea opened 2 years ago

rmorshea commented 2 years ago

Right now we don't serialize file content. Just metadata. Along with this we'll want to allow users to configure the max message size to help users protect against large file upload attacks.

See this comment for implementation details.

Archmonger commented 2 years ago

Just had a thought.

We could possibly leave file serialization to HTTP, and handle things on the backend.

For example, adding a HTTP endpoint


# Uploads are given a unique UUID path to route to
path("_idom/upload/{uuid}")

Not really sure how we'd be able to tell the WS side of things when an upload is a complete though... Can't rely on boolean flags due to cross-process compatibility.

Archmonger commented 2 years ago

If we do go down this road of serializing via websockets, it's going to be extremely important to stream the file to disk rather than buffer it in memory. Otherwise, this would be a DDOS attack vector.

Ideally, "file stuff" should be handled by individual frameworks... but that design of that might not be technologically feasible.

Archmonger commented 1 year ago

PyZMQ might be a decent option to communicate between separate ASGI processes.

rmorshea commented 1 year ago

The initial implementation should probably use form elements to upload files over HTTP. We can overwrite the default onSubmit behavior for forms to submit without a redirect. Unfortunately though, I don't think we can supply a built-in solution that will allow for inter-process communication - that will have to be supplied on a case-by-case basis depending on the user's deployment.

With that said, I have some thoughts on how this could be accomplished via the websocket in a more natural way, but it would require some pretty deep changes to how we send messages between the client and the server. Right now, there is no real messaging protocol - the server always sends layout updates and the client always sends layout events. We'd need to change this in order to allow the client and server to send different types of messages to each other. For example, all messages between the client and server would probably take the form:

{
  "type": str,  # some unique name
  "version": str,  # allow client and server to sort out compatibility
  ...  # any other data
}
Archmonger commented 1 year ago

We are likely going to develop two file upload handlers.

For small files, we stream the file directly to memory. For large files, we stream the file to disk (buffered in memory).

  1. Client begins sending file contents to the server
  2. Server reassembles the file contents into a buffer
  3. In the background, the server empties that buffer into a temp file onto the disk, in order to not use up all the system RAM

In order to support this without having this become an attack vector, we should implement a few safeguards:

  1. MAX_BUFFER_SIZE: The maximum amount of bytes to buffer within memory before throttling responses
  2. MAX_FILE_SIZE: The maximum size of a file that can be saved to the disk
  3. UPLOAD_DIR: Directory where files get saved
  4. If the transfer ends pre-maturely, the buffer and disk file are deleted
  5. Look at Django's upload handlers and see if there are other security steps that need to be taken
    • django.core.files.uploadhandler.MemoryFileUploadHandler
    • django.core.files.uploadhandler.TemporaryFileUploadHandler
Archmonger commented 1 year ago

We might want to transfer this issue to the reactpy-router repo