tus / tusd

Reference server implementation in Go of tus: the open protocol for resumable file uploads
https://tus.github.io/tusd
MIT License
2.95k stars 467 forks source link

High memory usage when uploading to Azure Storage #1031

Closed jspath-ankored closed 4 months ago

jspath-ankored commented 8 months ago

Describe the bug I see very high memory usage when uploading large files to Azure Storage. For example, a 221MB file upload results in using over 221MB of memory. I see this behavior when uploading to actual Azure Storage as well as uploading to Azurite locally.

It seems like something in the tusd Azure Storage code may be retaining all of the file info in memory, instead of streaming it to Azure Storage or an Azure Storage emulator, like Azurite.

To Reproduce Steps to reproduce the behavior:

  1. Setup tusd to upload to Azure Storage
  2. Ensure you have a way to monitor memory on the tusd server
  3. Upload a large file >= 221MB
  4. Look at the memory usage after

Expected behavior I would expect the memory usage to stay relatively low.

Setup details Please provide following details, if applicable to your situation:

jspath-ankored commented 8 months ago

here's a heap dump of me running it locally:

heap.txt

Acconut commented 7 months ago

From https://community.transloadit.com/t/tusd-failing-on-large-file-upload/16883/11

Acconut commented 7 months ago

From the heap dump, it is apparent that there is a 128MB and a 64MB allocated at

#   0x1007dd07b bytes.growSlice+0x8b                                    /usr/local/go/src/bytes/buffer.go:249
#   0x1007dca9b bytes.(*Buffer).grow+0x12b                              /usr/local/go/src/bytes/buffer.go:151
#   0x1007dcea3 bytes.(*Buffer).ReadFrom+0x43                               /usr/local/go/src/bytes/buffer.go:209
#   0x100912e57 bufio.(*Reader).WriteTo+0xb7                                /usr/local/go/src/bufio/bufio.go:533
#   0x100aa2dbf github.com/tus/tusd/v2/pkg/azurestore.(*AzUpload).WriteChunk+0x13f          /Users/jspath/git/tusd/pkg/azurestore/azurestore.go:145

It does indeed seems like that azurestore is copying the entire upload data into a memory buffer before uploading it to Azure: https://github.com/tus/tusd/blob/79709611ce7c32e784f2c609e993b5484a09f5a0/pkg/azurestore/azurestore.go#L143-L148

A new buffer is created and the entire request body is written into the buffer before the final size is examined and the data is uploaded. No data is streamed. I am not sure if that is possible with Azure. @omBratteng, do you know if there is a better way to avoid these allocations? For the s3store we save chunks of data on disk to avoid in-memory buffers, for example.

@jspath-ankored Does the memory get properly released once the upload is completed? Or is the memory kept alive after the upload, meaning we would have a memory leak.

omBratteng commented 7 months ago

It's been a while since I worked with this and I have since gotten a new job.

But if I recall correctly, it only keeps the chunks in memory before it uploads it to blob storage. We have uploaded files of ~10TiB without any issues (though we didn't really have any memory constraints in our cluster).

Though, there shouldn't be any issues with save chunks of data on disk.

jspath-ankored commented 7 months ago

It looks like it's not a memory leak, and the memory does eventually get released, although it takes around five minutes:

Screenshot 2023-11-21 at 10 28 54 AM

It would still be preferable if the memory usage didn't grow like this ... I uploaded a 221MB file, and after the 2nd upload, we were using 922.8MB of memory. This makes me concerned about using this on production.

If it matters, I am measuring the memory using the Live Charts extension for Docker Desktop. So maybe I'm just measuring the memory that the Docker container is using.

I know you can stream to Azure Blob Storage in .NET ... not sure about Go.

Acconut commented 7 months ago

the memory does eventually get released, although it takes around five minutes

I am no expert at Go's garbage collector, but Go might not release memory back to the OS immediately. The memory might be flagged internally as unused, but releasing it to the OS is another step that might only be taken when the memory is actually needed my another process. But please don't cite me on this :)

If it matters, I am measuring the memory using the Live Charts extension for Docker Desktop. So maybe I'm just measuring the memory that the Docker container is using.

If you want more insights into the memory usage as seen from the Go internals, you can use the metrics exposed by tusd (e.g. http://tusd.tusdemo.net/metrics). They contain many different memory-related metrics and might help you understand how much memory Go thinks is in use right now.

However, I agree that it would be ideal to avoid this in-memory buffering altogether. Would you be interested in trying out to stream the data directly to Azure (or stream it in chunks)?

jspath-ankored commented 7 months ago

Would you be interested in trying out to stream the data directly to Azure (or stream it in chunks)?

Yes. We are not using tusd in production yet, and won't be a for a while yet, so I am open to trying out streaming.

Are you asking me to implement, or just try out some changes?

Acconut commented 7 months ago

Are you asking me to implement

Yes, improving the memory usage of the Azure store would require some implementation changes. I personally do not use Azure on my own and don't have access to it, making me not the best person to tackle this.

quality-leftovers commented 5 months ago

We noticed that the body is ready into memory when reviewing the code before settling and tusd and decided to use 5 MiB Patch Chunks in the clients for now. I'd like to have tried taking a stab at improving it but didn't have time for it :(

Acconut commented 5 months ago

Does this also occur if you use Azurite? If so, I can try to reproduce it locally and debug it.

jspath-ankored commented 5 months ago

Does this also occur if you use Azurite? If so, I can try to reproduce it locally and debug it.

Yes, the same issue happens using Azurite locally: https://community.transloadit.com/t/tusd-failing-on-large-file-upload/16883/4

Acconut commented 5 months ago

I implemented the buffer in a temporary file on disk instead of in-memory. Please have a look at https://github.com/tus/tusd/pull/1070, test it out, and let me know if it behaves as expected, especially in terms of memory usage. In my local tests, no memory growth was visible over time.