tus / tusd

Reference server implementation in Go of tus: the open protocol for resumable file uploads
https://tus.github.io/tusd
MIT License
2.93k stars 465 forks source link

azurestore: Buffer upload data on disk instead of in-memory #1070

Closed Acconut closed 4 months ago

Acconut commented 5 months ago

Closes https://github.com/tus/tusd/issues/1031

Instructions for testing this out locally:

  1. npm install -g azurite
  2. brew install azure-cli or similar depending on OS
  3. Start Azurite: azurite --location ./azurite/ --debug ./azurite/debug.log
  4. Create container: az storage container create --name mycontainer --connection-string "DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1;"
  5. Start tusd: AZURE_STORAGE_ACCOUNT=devstoreaccount1 AZURE_STORAGE_KEY=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw== go run cmd/tusd/main.go -azure-endpoint http://127.0.0.1:10000/devstoreaccount1 -azure-storage mycontainer
quality-leftovers commented 4 months ago

I'm wondering if writing to disk always makes sense since some azure hosting services might only allow mounting files from azure storage account (not sure, need to check - just got it for container apps env wrong).

Looking at the s3store_part_producer it seems to solve this already since it supports a _memory option for the parts producer. There also seems to be nothing s3 specific to the producer. Wouldn't it be possible to re-use the parts generator here?

Acconut commented 4 months ago

I'm wondering if writing to disk always makes sense since some azure hosting services might only allow mounting files from azure storage account (not sure, need to check - just got it for container apps env wrong).

It would seem odd to me that no temporary directory is provided for storing intermediate results. I would expect some temporary storage to always be available.

Looking at the s3store_part_producer it seems to solve this already since it supports a _memory option for the parts producer. There also seems to be nothing s3 specific to the producer. Wouldn't it be possible to re-use the parts generator here?

I am not sure about this. For S3, we split a PATCH request into multiple parts of a fixed size (by default 50MB), which are uploaded and then discarded as they come in. There is a limit on how many parts can be buffer concurrently (by default 20), so there is an upper limit on the memory consumption.

For the azurestore, there is not such chunking, but instead the entire PATCH request is read into a temporary storage before being uploaded. If we want to reuse the part generator from S3, the azurestore must be adjusted so that it also uploads in smaller parts or otherwise we will have the same unlimited memory, that this PR tries to fix.

Does that make sense?

quality-leftovers commented 4 months ago

True, it is unusual to have no local storage, but there are VMs Dasv5 and Easv5 without any temp disk. But it is possible to attach "remote" storage disks so it sould be ok.

Regarding the parts generator also true. I was just wondering why the S3 store has such sophisticated logic while the azurestore is a bit dumb :P.

Acconut commented 4 months ago

Regarding the parts generator also true. I was just wondering why the S3 store has such sophisticated logic while the azurestore is a bit dumb :P.

Well, the S3 API has a few restrictions (minimal size of upload parts and maximal number of total parts) that make this kind of logic necessary. I would prefer it if we didn't have to do much of this :)

Would you be interested in testing this PR? Maybe you can run it locally and see if it works well for you? I don't use Azure normally, so a second pair of eyes would help catch issues.

quality-leftovers commented 4 months ago

I did some testing running the container both locally and as Azure Container App and was unable to reproduce any high memory or out-of-memory problems. Existing upplication / test suite also worked "as normal".

Acconut commented 4 months ago

Ok, thank you very much! Then let's get this released!