tus / tus-resumable-upload-protocol

Open Protocol for Resumable File Uploads
https://tus.io
MIT License
1.48k stars 103 forks source link

Clarifications for Concat Mode #198

Closed helloimalastair closed 2 months ago

helloimalastair commented 2 months ago

I'm thinking about building a TUS-Adapter for an S3-compatible backend. I have two questions that I don't think are answered on the protocol docs, and I would very much appreciate your feedback.

  1. Can Concat Mode have a Tus-Max-Size that is larger than the default?

The storage backend supports Multipart Uploads, with a maximum of 10k equally-sized parts(other than the last one, which can be smaller). This means, however, that each part may need to be stored in memory, in case we don't actually reach the full part size required by the backend(in this case the platform I am building on would limit usable memory to ~100 MB). Thus, any uploads processed this way have a max size of 1 TB.

But, if we use concatenation, then we can push the part size to the maximum allowed by the backend(~5 GiB) since the data doesn't need to be cached in memory, and thus maybe achieve the platform-mandated max size of ~5 TiB per object(assuming that you aren't concatenating objects smaller than ~100 MB anyway). Would there be a way then to tell clients that they can upload a ~5 TiB object only if they use concatenation, or do I need to limit the Max Size for the entire platform to the lower 1 TB limit?

  1. Is there a way to signify that a Final Concat may take a while, and thus may need to be rechecked later on?

Depending on the size, it may take a little while for the backend to zip the given files into a single unified file. Given that TUS is made for handling unstable network connections, which may not allow you to stay connected to a Request for a longer period of time, is there a way to tell the client to send the final concat, then retry a few times until it succeeds(or will it to that by default anyway)?

Acconut commented 2 months ago

I'm thinking about building a TUS-Adapter for an S3-compatible backend.

Using S3 as a backend can be a bit of a pain as its limits makes it hard to implement the flexibility that tus has been designed for. It's a nice challenge but is also annoying at the same time. I would recommend you to checkout the source code for the s3store in tusd, which already implemented the core protocol and concatenation extension based on S3. If possible, you might want to use tusd directly, then you can save yourself the work :)

  1. Can Concat Mode have a Tus-Max-Size that is larger than the default?

The intention behind Tus-Max-Size is that it also applies to concatenated uploads, yes. In the end, it's a means to let the client know what the maximum file size is that the server will handle, regardless of the method used for uploads (concatenated or not).

However, if you prefer to have separate limits for concatenated and non-concatenated uploads, you are free to do so as this is an implementation choice. But tus does not have a means to communicate different limits for concatenated uploads.

This means, however, that each part may need to be stored in memory, in case we don't actually reach the full part size required by the backend

tusd avoids buffering upload data in memory and instead flushes it to a temporary object on S3 that is unrelated to the S3 multipart upload. This way it can save data smaller than the part size without utilizing memory.

2. Is there a way to signify that a Final Concat may take a while, and thus may need to be rechecked later on?

tus does not (yet) have a way to signal post-processing to the client. We have thought about utilizing Content-Location for this, where the server could quickly return a response with Content-Location pointing to another resource. While the server is concatenating the upload, it will regularly update the other resource on its progress and the client can poll the resource to see when the concatenation is finished. At least in theory. In practice, this has not been fully tested yet.

Depending on your use case, you might not need the client to know when the concatenation is finished. When a client wants to concatenate large uploads, the server could quickly return a successful response after doing basic request validation and then perform the concatenation in the background. Once the conat is done, the further file processing happens behind the scenes without client interaction.

helloimalastair commented 2 months ago

Thank you for your response!

If possible, you might want to use tusd directly, then you can save yourself the work :)

Unfortunately, the platform I am deploying to wouldn't support running tusd, but thanks for the advice on checking the source.

tusd avoids buffering upload data in memory and instead flushes it to a temporary object on S3 that is unrelated to the S3 multipart upload.

Interesting, I will have to take a look.

We have thought about utilizing ContentLocation for this, where the server could quickly return a response with Content-Location pointing to another resource.

I will follow this one, thanks!