tus / tus-resumable-upload-protocol

Open Protocol for Resumable File Uploads
https://tus.io
MIT License
1.48k stars 103 forks source link

Upload Tag extension #163

Closed felix-schwarz closed 3 years ago

felix-schwarz commented 3 years ago

This PR is a continuation of #157, with the following key differences:


The Upload Tag Extension to the tus protocol proposed here allows clients to resume an upload even if they did not receive the response to the creation or creation-with-upload request with which they initiated the upload.

The mechanism for this is relatively simple:

Mobile client apps benefit of this extension in various ways:

In my proposal, I tried to match the language and style of the existing specification as closely as possible.

Feedback welcome! 🙂

smatsson commented 3 years ago

I put together a POC/draft implementation of Client-Tag here: https://github.com/tusdotnet/tusdotnet/tree/POC/Client-Tag

One can test it live at https://tustemp.azurewebsites.net/files using Postman or similar. Disclaimer: There might be bugs since it's a POC.

Restrictions in place:

Here is a list of things I found doing this implementation and possible solutions (in no particular order):

  1. I found it a bit confusing that the extension is called client-tag but the header is called upload-tag. Should we rename the header? Or is it just something to get used to?

  2. Returning 409 conflic on create leaks data on what tags are already used. Might not be an issue as you cannot access the tag anyway?

  3. I took the liberty of also implementing this for concatenation when multiple files are merged. It might be a small use case but I think it makes sense for consistency. We should change the text below

    This allows Clients to resume an upload initiated through the Creation With Upload or Creation extensions for which they did not receive the response with the newly created resource's URL (for example due to a broken connection).

into something like:

This allows Clients to resume a created upload for which they did not receive the response with the newly created resource's URL (for example due to a broken connection).

  1. Should we define what should happen if you try to do a HEAD request to the upload url without the Upload-Tag? In my reference implementation I chose to return 404 Not Found.
Acconut commented 3 years ago

Thanks for moving this into a new PR, @felix-schwarz!

Acconut commented 3 years ago

@felix-schwarz Would it be OK if I add some commits to this PR to show what I mean with some of my comments?

felix-schwarz commented 3 years ago

Wow, thanks for your feedback @Acconut & @smatsson! 😍

I'm now working through your feedback from top to bottom and will commit a revised version taking your feedback into account once done.

@felix-schwarz Would it be OK if I add some commits to this PR to show what I mean with some of my comments?

@Acconut Sure, that'd be fine! Preferably after I pushed my changes, though, to avoid version conflicts.

felix-schwarz commented 3 years ago

I put together a POC/draft implementation of Client-Tag here: https://github.com/tusdotnet/tusdotnet/tree/POC/Client-Tag

Awesome, thanks a lot!

Here is a list of things I found doing this implementation and possible solutions (in no particular order):

  1. I found it a bit confusing that the extension is called client-tag but the header is called upload-tag. Should we rename the header? Or is it just something to get used to?

Agreed. I'm changing the name to Upload Tag.

  1. Returning 409 conflic on create leaks data on what tags are already used. Might not be an issue as you cannot access the tag anyway?

As long as Upload Tags are tracked on a per-user basis, this shouldn't be a concern:

  1. I took the liberty of also implementing this for concatenation when multiple files are merged. It might be a small use case but I think it makes sense for consistency. We should change the text below

This allows Clients to resume an upload initiated through the Creation With Upload or Creation extensions for which they did not receive the response with the newly created resource's URL (for example due to a broken connection).

into something like:

This allows Clients to resume a created upload for which they did not receive the response with the newly created resource's URL (for example due to a broken connection).

Good catch! I'll update that to:

This allows Clients to resume an upload they initiated, but for which they did not receive the response with the newly created resource's URL (for example due to a broken connection).

  1. Should we define what should happen if you try to do a HEAD request to the upload url without the Upload-Tag? In my reference implementation I chose to return 404 Not Found.

Right now the proposal doesn't require the inclusion of Upload-Tag headers with all follow-up requests - and I'm hesitant to include one. Some implementations might choose to use the Upload-Tag solely to recover the Upload URL with a single HEAD request - and then continue to use the Upload URL instead of the Upload Tag.

felix-schwarz commented 3 years ago

@felix-schwarz Would it be OK if I add some commits to this PR to show what I mean with some of my comments?

@Acconut Sure, that'd be fine! Preferably after I pushed my changes, though, to avoid version conflicts.

I've just pushed all of my existing changes, so please go ahead.

smatsson commented 3 years ago
  1. Should we define what should happen if you try to do a HEAD request to the upload url without the Upload-Tag? In my reference implementation I chose to return 404 Not Found.

Right now the proposal doesn't require the inclusion of Upload-Tag headers with all follow-up requests - and I'm hesitant to include one. Some implementations might choose to use the Upload-Tag solely to recover the Upload URL with a single HEAD request - and then continue to use the Upload URL instead of the Upload Tag.

I think I was sloppy in my question here. For a HEAD request for e.g. /files (note no file id), should we have a predefined response if the Upload-Tag is missing? If an upload tag is provided during creation the Client can do one of the two following requests to get the file info:

HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Tus-Resumable: 1.0.0

OR 

HEAD /files/ HTTP/1.1
Host: tus.example.org
Tus-Resumable: 1.0.0
Upload-Tag: myTag

If a request for the latter is received without a Upload-Tag header, what's the best way to handle it? 404? 400?

Acconut commented 3 years ago

I made a few changes to the PR:

Let me explain reasons for the last change: In its current form, the tus protocol does not require authentication at any point, which was a decision we made intentionally. There are applications were authentication is not needed and I would like tus server in these circumstances to also be compatible with the standard. For example: The demo server at https://master.tus.io does not require authentication because it is a public demo server. Any data uploaded there should be considered publicly available, so authorization is not needed. In fact, authorization would probably make it harder for users to try the demo server with various tus clients.

I would consider this PR to be ready for merge in its current form, unless you have any feedback that should be incorporated. What do you think, @smatsson and @felix-schwarz?

smatsson commented 3 years ago

@Acconut Some minor things:

Clients MAY then use the returned upload URL in the Location header to resume the upload.

To me, this reads as there are other ways to continue the upload. Clients SHOULD use the returned upload URL in the Location header to resume the upload. would be a better fit IMO.

If the Upload-Tag header is invalid in a POST or HEAD request, the Server MUST respond with the 400 Bad Request status.

Would this imply being non-ascii and/or larger than 256 characters? Just want to double check.

I think the security considerations part is spot on and would be easily expanded to work with the challenge extension.

In the examples, should we remove Authorization? It's not used anywhere else in the spec.

Should we update the spec version? I know we are still discussing how to do this but this seems like a good opportunity if we want to increase the minor version?

felix-schwarz commented 3 years ago

Sorry it took me a while to get back to you!

@Acconut Thanks for the feedback and making these changes. I agree with all of them.

In the "Security Considerations" part, I changed "to fulfill this requirement" to "to fulfill this recommendation" to match the rest of the segment.

@smatsson

Clients MAY then use the returned upload URL in the Location header to resume the upload.

To me, this reads as there are other ways to continue the upload. Clients SHOULD use the returned upload URL in the Location header to resume the upload. would be a better fit IMO.

Good point. I changed that accordingly.

If the Upload-Tag header is invalid in a POST or HEAD request, the Server MUST respond with the 400 Bad Request status.

Would this imply being non-ascii and/or larger than 256 characters? Just want to double check.

I think this would include any case where an Upload-Tag is not valid.

In the examples, should we remove Authorization? It's not used anywhere else in the spec.

Even though authorization is only recommended, not required, I'd prefer to leave these in so that the examples show best practice. But removing them would also be ok with me.


@Acconut From my side, the PR is ready to merge in its current form.

smatsson commented 3 years ago

+1 from me on the above. Ready to merge :)

smatsson commented 3 years ago

Do we wish to update the protocol version as per the discussion in #165 before merging this?

Acconut commented 3 years ago

Thank you for the update, @felix-schwarz! Looking good now!

Only one thing came to my mind: We should restrict the Upload-Tag to printable ASCII characters to exclude control sequences such as the NUL character (see https://theasciicode.com.ar/ascii-control-characters/delete-ascii-code-127.html). I think these are codes 32 to 126. What do you think?

Do we wish to update the protocol version as per the discussion in #165 before merging this?

Not sure, I would like to add a few more things to 1.1 so I am not sure if it's better to bump the version in a separate PR.

felix-schwarz commented 3 years ago

@Acconut Sorry for taking so long to follow up!

I think that clarification regarding printable ASCII codes is a good idea and have changed it accordingly - referencing Wikipedia (I couldn't find an RFC defining the "printable ASCII codes" term) and the 32-126 range as quick reference.

Acconut commented 3 years ago

Thank you very much for the final touches! I added the list of allowed ASCII characters, so it's easier to understand what we are talking about. Also, added an exclusion for spaces, to remove edge cases and potential failure.

All in all, this can be merged in the 1.1 branch :tada:

felix-schwarz commented 3 years ago

@Acconut Thanks for the final touch improvements! Awesome to see it in the 1.1 branch! 🎉