reflex-dev / reflex

🕸️ Web apps in pure Python 🐍
https://reflex.dev
Apache License 2.0
20.27k stars 1.16k forks source link

File upload appears to hold everything in memory until the end #3517

Open bertbarabas opened 4 months ago

bertbarabas commented 4 months ago

Describe the bug

  1. For large files, memory upload consumes significant memory and it grows even more when writing out the file iobuffer.
  2. Upload is very slow and consumes significant CPU when everything is on the same host (both reflex and the browser). Uploading files shouldn't really consume much CPU at all...

To Reproduce Steps to reproduce the behavior:

Expected behavior Very little memory should be used because the data should be incrementally written to a file. To avoid re-writing the file after upload, I'd expect to be able to specify the target or if you choose to write to a temporary file I'd expect to be able to rename the temporary file to it's final home.

Specifics (please complete the following information):

bertbarabas commented 4 months ago

Another odd behavior for upload is the on_upload_progress which claims the upload is 100% done but then spends another 10% of the time before returning from rx.upload_files and passing the result on to the handle_upload function.

jaypatidar14 commented 3 months ago

can i work on these issuse

picklelo commented 3 months ago

@jaypatidar14 just assigned you! Let us know if you need any help

garv901 commented 2 months ago

Hello! can I work on this issue?

picklelo commented 2 months ago

@garv901 assigned you!

bertbarabas commented 2 months ago

I should point out that the very large memory consumption only happens at the end of the upload just as the file write is starting.

It seems fairly moderate during most of the upload which I attribute to the fact that uvicorn is doing the actual upload and writing to a temp file behind the scenes.

To try to get around this issue, I started writing out the upload in chucks (see below) but it didn't help. If I upload a 4GB file my memory jumps from 1GB to 5GB at the end of the upload.

with self.wip_file.open("wb") as file_object:
    while upload_data := await file.read(1_000_000):
        file_object.write(upload_data)
bertbarabas commented 2 months ago

I have one more observation that seems to point the root of the issue at gunicorn. I decided to see if it was any faster to upload when running reflex in production mode and I see now that instead of python consuming 100% CPU, it's the gunicorn process.

image