guzba / mummy

An HTTP and WebSocket server for Nim that returns to the ancient ways of threads.
MIT License
281 stars 11 forks source link

[Question/FR] separate code path/library for large file support? #69

Closed ITwrx closed 1 year ago

ITwrx commented 1 year ago

I know you tried to address this in the readme, but is there no way for mummy to support large file upload in a production-ready way? i.e. stream to disk instead of using ram, and chunking for extra large files. Bonus points for resume-ability.

Nim async has problems with this ([1], [2]). It is supposedly possible with microsasynchttpserver, but there is no example code, and it's async vs threads.

Perhaps a separate router proc, or a complimentary library?

I'm still relatively green in nim, and have not implemented something like this before, so i'm not expecting to be much help on code, but i'm open to pitching in on any desired funding.

thanks

guzba commented 1 year ago

It is not so much that it is impossible, just I have not focused Mummy on that use-case. It's just a matter of time and effort and what I need for my own work.

It's not that big uploads absolutely will not work either, they work great, you just need to have the RAM to hold the files as they are received. This is easy for 10s or 100s of MB, but gets harder for GB files.

I have personally had such amazing success with using various object stores like Amazon's S3, Backblaze B2, Cloudflare R2, etc that I would strongly suggest to everyone that they just set up a bucket and use Depot to create presigned upload URLs that can be used (even directly from clients). Simple example: https://github.com/guzba/depot/blob/master/examples/file_upload.nim

These can be as private or public as you want, and as large or small as you want, and you don't have to concern yourself with the file upload at all as the API server producing the signed URLs.

The large-file use-case would be "I want very large file uploads, but also will not use object stores". There are reasons for this of course, but it's not something I personally would see myself doing so I have not worked on that since I have many other exciting things to work on and improve.

ITwrx commented 1 year ago

Fair enough. Thanks for the detailed response. I completely forgot that you and/or treeform were advocating for the usage of these cloud storage companies in the nim forum, or i wouldn't have bothered you with this, as i could have guessed your position on the matter.

guzba commented 1 year ago

I could totally be wrong, but this reply seems to imply something wrong with using cloud storage. If that is the case, could you help me understand the perspective? I'm not going to attack back or whatever, just curious.

I assume you'll be running VMs if you are talking about production, so I don't know how a VM disk would be more private or whatever than an object store from the same provider. Perhaps the concern is for those hosting on their own hardware or in intranet-only settings?

For context, a public service I run has over 100TB of files stored and it's not even a huge service but does deal with files. This makes me skeptical of anything related to putting it on VM disk but I would have a better time thinking through an alternative if I knew the motivations and requirements.

ITwrx commented 1 year ago

this reply seems to imply something wrong with using cloud storage

not necessarily wrong, just usually, and sometimes emphatically, wrong for me. :)

Perhaps the concern is for those hosting on their own hardware

Yes, I built my own colo server, pay for colo hosting, and would prefer to host the data myself most of the time, for various reasons, whether perceived or real, debatable, or objective. :)

thanks

guzba commented 1 year ago

Thanks for sharing. Totally get it. I really don't stand against this. It is a completely sensible situation to conclude not to use any cloud providers and makes sense.

I'm in the "once you're using even a tiny bit of the cloud providers (vms), I believe object stores are like vms in that you can still avoid vendor lock-in so it's not so bad to use an object store" camp. So, while I do use the cloud, would still be all for avoiding lock-in to a single cloud.

Not at all the same thing but we all draw the line somewhere haha.