Closed Acconut closed 4 months ago
I'm wondering if writing to disk always makes sense since some azure hosting services might only allow mounting files from azure storage account (not sure, need to check - just got it for container apps env wrong).
Looking at the s3store_part_producer
it seems to solve this already since it supports a _memory
option for the parts producer. There also seems to be nothing s3
specific to the producer. Wouldn't it be possible to re-use the parts generator here?
I'm wondering if writing to disk always makes sense since some azure hosting services might only allow mounting files from azure storage account (not sure, need to check - just got it for container apps env wrong).
It would seem odd to me that no temporary directory is provided for storing intermediate results. I would expect some temporary storage to always be available.
Looking at the
s3store_part_producer
it seems to solve this already since it supports a_memory
option for the parts producer. There also seems to be nothings3
specific to the producer. Wouldn't it be possible to re-use the parts generator here?
I am not sure about this. For S3, we split a PATCH request into multiple parts of a fixed size (by default 50MB), which are uploaded and then discarded as they come in. There is a limit on how many parts can be buffer concurrently (by default 20), so there is an upper limit on the memory consumption.
For the azurestore, there is not such chunking, but instead the entire PATCH request is read into a temporary storage before being uploaded. If we want to reuse the part generator from S3, the azurestore must be adjusted so that it also uploads in smaller parts or otherwise we will have the same unlimited memory, that this PR tries to fix.
Does that make sense?
True, it is unusual to have no local storage, but there are VMs Dasv5 and Easv5 without any temp disk. But it is possible to attach "remote" storage disks so it sould be ok.
Regarding the parts generator also true. I was just wondering why the S3 store has such sophisticated logic while the azurestore is a bit dumb :P.
Regarding the parts generator also true. I was just wondering why the S3 store has such sophisticated logic while the azurestore is a bit dumb :P.
Well, the S3 API has a few restrictions (minimal size of upload parts and maximal number of total parts) that make this kind of logic necessary. I would prefer it if we didn't have to do much of this :)
Would you be interested in testing this PR? Maybe you can run it locally and see if it works well for you? I don't use Azure normally, so a second pair of eyes would help catch issues.
I did some testing running the container both locally and as Azure Container App and was unable to reproduce any high memory or out-of-memory problems. Existing upplication / test suite also worked "as normal".
Ok, thank you very much! Then let's get this released!
Closes https://github.com/tus/tusd/issues/1031
Instructions for testing this out locally:
npm install -g azurite
brew install azure-cli
or similar depending on OSazurite --location ./azurite/ --debug ./azurite/debug.log
az storage container create --name mycontainer --connection-string "DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1;"
AZURE_STORAGE_ACCOUNT=devstoreaccount1 AZURE_STORAGE_KEY=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw== go run cmd/tusd/main.go -azure-endpoint http://127.0.0.1:10000/devstoreaccount1 -azure-storage mycontainer