Open codeSamuraii opened 1 month ago
To help Streamlit prioritize this feature, react with a š (thumbs up emoji) to the initial post.
Your vote helps us identify which enhancements matter most to our users.
Hey @codeSamuraii! Thanks for the awesome investigation, this is great! We thought about uploading files to disk a few times in the past but it was never on the top of our priority list. But I believe at some point we'll do it, especially if this gets more upvotes!
Checklist
Summary
Hello,
Currently, Streamlit stores the entirety of uploaded files in memory. This limits upload size to available RAM of the machine.
I managed to implement chunked multipart uploads straight to disk, allowing for arbitrary upload sizes. However my knowledge of Streamlit's inner workings is quite limited and I can't find an elegant solution worthy of a PR. Hence I wanted to share with you parts of my solution, in the hope of someone picking it up and implementing it properly.
I started by subclassing
UploadFileRequestHandler
(streamlit/web/server/upload_file_request_handler.py
), which handles the file upload:@tornado.web.stream_request_body
, we can can define adata_received(self, chunk)
method to receive chunked data.python-multipart
, we can create a parser in theprepare()
method, and feed it data in the previously mentionneddata_received(self, chunk)
.python-multipart
then gives us access to aFile
instance through anon_file()
method that we define on the class.This is where it got very hacky and custom for me, modifying
UploadedFile
,UploadedFileRec
etc., and where I believe an expert view would be welcomed.Here is the full code:
Adding a
to_disk
keyw. arg. tost.file_uploader
to upload files to a new endpoint would be ideal, however I have very limited front-end experience and Streamlit's widget rendering is obscure to me.I hope this can help someone to implement this feature properly.
Why?
No response
How?
No response
Additional Context
No response