s3store: Experiment with optimistic part uploads to S3

Acconut commented 9 months ago

Right now, an incoming PATCH request is split into multiple parts. Each part is saved on disk (or optionally in memory) and then queued to be uploaded to S3. These uploads can be performed in parallel. However, it seems as if this process can be very CPU-intensive (see #924 and many other reports about performance concerns of the s3store).

I wonder if it we can skip the step reading the parts into a temporary buffer (on disk or in memory) altogether. The reason for first reading into a buffer is that we want to know the size of the part before uploading it to S3. The UploadPart and PutObject calls in S3 require the Content-Length header to be present, so we must know the size before starting an object/part upload (I get this understanding from their docs and my past experiments; maybe this is not true anymore). Thus, we have to supply some part size when starting the upload to S3.

An alternative approach is to estimate the part size when we start receiving data from the PATCH request. If we know the total tus upload size and the request's size, we can estimate what size the next part will have. If not, we will just assume that a full part is filled. Then, we open a new UploadPart request to S3 with out estimated Content-Length and start piping the PATCH request's content directly into the request to S3. If the amount of data read from the incoming request is smaller than the Content-Length that we provided to S3, the upload will fail and we have to fallback to uploading it into a .part object using PutObject, so that data can be used for resuming the upload in the next PATCH request. However, this would mean that we have to keep some data in memory, so we can fallback to storing the temporary data in a .part object. Maybe in the end, this is not much different than just buffering the parts in memory.

namandureja commented 7 months ago

Hi @Acconut , this is an interesting find. I was studying the tusd implementation and also thought of this as an issue. Do you have any updates on this experimentation ?

Acconut commented 7 months ago

There has been no further investigation since I opened this issue. I think it's still interesting, but I am more concerned about the memory usage here. In the comment above I wrote:

However, this would mean that we have to keep some data in memory, so we can fallback to storing the temporary data in a .part object.

If the part size 50MB, then 100 uploads would already consume 5GB of memory. When we want to scale tusd to support many parallel uploads, this might lead to issues.

tus / tusd

s3store: Experiment with optimistic part uploads to S3 #1084