tus / tus-resumable-upload-protocol

Open Protocol for Resumable File Uploads
https://tus.io
MIT License
1.49k stars 102 forks source link

Ambiguous Tus-Concatenation specifications #191

Closed Newex closed 1 year ago

Newex commented 1 year ago

I'm trying to implement a library that follows the spec.

But I am confused by the following 2 seemingly contradictory statements under the tus extension Concatenation:

The length of the final upload MUST be the sum of the length of all partial uploads.

The Server MAY delete partial uploads after concatenation. They MAY however be used multiple times to form a final resource.

As I understand the first statement, I can for example upload 2 partial files each 50 bytes in length. Then the final upload must be 100 bytes in length.

As I understand the second statement, the server can reuse uploads multiple times, for example upload A is 50 bytes and upload B is 25 bytes then the server can create a final resource using: 1*A + 2*B = 100 bytes. But this then contradicts the first statement because the sum of A + B = 75 bytes.

I'm sure I misunderstood something, could someone clarify?

Newex commented 1 year ago

Got the answer from ChatGPT 3.5:

I understand the confusion you're facing, but these two statements are not necessarily contradictory when you consider the specific context and use cases of the Tus protocol.

  1. "The length of the final upload MUST be the sum of the length of all partial uploads."
    This statement is a general guideline that suggests the total length of the final resource should equal the sum of the lengths of all the partial uploads. In your example, if you have two partial uploads, A and B, with lengths 50 bytes and 25 bytes respectively, the total length of the final resource should indeed be 75 bytes based on this guideline.
  2. "The Server MAY delete partial uploads after concatenation. They MAY however be used multiple times to form a final resource." This statement provides flexibility to servers. It means that servers are allowed to delete the partial uploads after they have been concatenated into the final resource. However, it also acknowledges that servers might choose to keep these partial uploads for various reasons.

    In your example, the server could indeed reuse uploads A and B to create the final resource by concatenating them. Still, it's not required to keep them once the final resource has been assembled. The key point is that while they are retained, the total length of the final resource should still be based on the sum of the lengths of the partial uploads.

So, these statements do not necessarily contradict each other. The first statement sets the expectation for the length of the final resource, and the second statement allows servers some flexibility in managing the partial uploads but does not change the expectation set by the first statement regarding the final resource's length.


Q: So the 2nd statement is about after concatenation, then another upload may reuse those partial uploads?


Yes, that's correct. The second statement is primarily about what the server can do with the partial uploads after they have been concatenated to form the final resource.

After concatenation, the server may choose to retain or delete the individual partial uploads. If it retains them, they may be reused in subsequent uploads to form new final resources. In other words, those partial uploads can potentially be used multiple times to create different final resources. This provides a level of flexibility in how servers handle the storage and reuse of partial uploads, but it doesn't change the fact that the length of the final resource should still be based on the sum of the lengths of all partial uploads that contributed to it.

So, you have the correct understanding: the second statement is about the server's post-concatenation actions and the possibility of reusing partial uploads in future uploads.


For some reason I thought the spec was talking about "the current" final upload. So it is a potential future other final upload.

Acconut commented 1 year ago

As I understand the second statement, the server can reuse uploads multiple times, for example upload A is 50 bytes and upload B is 25 bytes then the server can create a final resource using: 1*A + 2*B = 100 bytes.

That is correct.

The quoted text from the spec allows two type of using a partial upload multiple times:

When we wrote this text, we had the second use case in mind, but the description is a bit ambiguous. We should add text to explain both use cases and make it clearer. However, I am not sure what alternative to sum we can use here. Would you have a suggestion?

Newex commented 1 year ago

Ok now I see.

My initial instinct was probably right, I felt that there were 2 use cases.

My mistaken assumption was that summing up the parts would require unique elements, but as in the 2nd example the summation could be rewritten as: A + B + B

All in all I don't have a suggestion for an alternative to sum but I do have a better idea how to interpret the spec.

Acconut commented 1 year ago

Thank you, I added some text to hopefully prevent such confusion in the future.