tus / tus-resumable-upload-protocol

Open Protocol for Resumable File Uploads
https://tus.io
MIT License
1.5k stars 102 forks source link

Support streams with unknown length? #16

Closed felixge closed 9 years ago

felixge commented 11 years ago

see discussion here: http://www.tus.io/protocols/resumable-upload.html#comment-865108402

vayam commented 11 years ago

I am redesigning Vimeo Upload API. I am thinking of adopting tus protocol. I have couple of questions and suggestions. I will probably make a separate github issues.

We have had requests from users wanting to upload unknown length not necessarily to pipe it to our transcode pipeline. I totally agree with you. It should be elegant solution and should not complicate the 99% of the use cases.

Here is what I am thinking. Let me know:

How about making "Content-Range" header optional

POST /files HTTP/1.1 Host: tus.example.org Content-Length: 0 Content-Type: image/jpeg Content-Disposition: attachment; filename="cat.jpg"

PUT /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1 Host: tus.example.org Content-Length: 100 Content-Range: bytes 0-99/* <------------------ to indicate unknown content length

PUT /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1 Host: tus.example.org Content-Length: 100 Content-Range: bytes 100-199/* <------------------ to indicate unknown content length

To indicate completion of upload. PUT /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1 Host: tus.example.org Content-Length: 0

felixge commented 11 years ago

@vayam what's your timeline for this? Having vimeo adopt the tus protocol would be a huge honor, but I think it will take us a few more weeks to create the right solution here. I've spend a lot of time discussing the right headers and http features with various people, including the httpbis (http 2.0) working group, and I'm currently gravitating towards:

Anyway, creating a solution for your use case should be very doable, and I hope to have a new 0.2 proposal up by the end of this week.

vayam commented 11 years ago

@felixge I like the way this is shaping. Our current Uploadservers already supports resumable uploads, parallel chunked uploads. So I can totally wait for few more weeks till your spec is finalized.

I was not very thrilled with wrong usage of Range header in response and Content-Range in request.

For the most basic case: Start with a PUT and fix with a PATCH seems like a good approach. Although I am skeptical about using PATCH method. To resume a failed upload, I like your approach of appending "offset" to Content-Type

Content-Type: image/jpeg; offset=-1 would mean append

Parallel chunks are more of a special case. It should be optional or add on like you said. There should be way for client to detect if the server supports parallel chunks. It could be as simple as custom request header x-tus-chunk. The server can respond back as not supported with a 412 Precondition Failed

Is it more useful to send out 206 for HEAD/GET requests when the file is still being uploaded.

HTTP/1.1 206 Partial Content Accept-Ranges: bytes Content-Range: bytes 0-1000/9999 Content-Type: video/quicktime Content-Length: 1001

Do you plan handle pause and resume? A lot of our users like the feature. a use case is start upload at a coffee shop and pause the upload and then resume when you get home. It can be easily be handled if we keep expiry of incomplete upload for a longer duration. It is probably not needed to explicitly indicate a pause to server.

I have some thoughts about supporting checksum/md5 and metadata for the file being uploaded (during upload and after). Other things are signing the requests for security. Though these should not be part of core spec. I would like to see how they can be built on top as extensions.

Probably that is already being discussed. I will go through all open discussions and get back to you on those.

kvz commented 11 years ago

Parallel chunks are more of a special case. It should be optional or add on like you said. There should be way for client to detect if the server supports parallel chunks. It could be as simple as custom request header x-tus-chunk. The server can respond back as not supported with a 412 Precondition Failed

I'm thinking if the server replies with a tus-chunk supported header on the first POST (or maybe PUT as you mention), the client can then follow up using parallel chunks if it wants. This way we do not need to query the server separately.

Do you plan handle pause and resume? A lot of our users like the feature. a use case is start upload at a coffee shop and pause the upload and then resume when you get home.

I don't think the protocol would need anything extra to support pauze/resume. In fact the jquery plugin demo that I wrote with @tim-kos based on the initial draft of the protocol already features a stop button. Resume would be easy to implement clientside, without adding additional changes / specification to the protocol.

felixge commented 11 years ago

Although I am skeptical about using PATCH method.

Yeah. I think in order to create something that is usable today, we have to give up on the idea of making the entirely RESTful. So I'm actually thinking about using POST instead of PATCH/PUT. PUT is not appropriate since it's meant to replace a resource entirely, and PATCH is risky when it comes to support from proxies, libraries, etc..

It should be optional or add on like you said. There should be way for client to detect if the server supports parallel chunks

I think feature detection will mean additional network / resource overhead in many cases, so IMO enabling the optional extensions should be done via configuration options in the clients, which of course will only work when uploading against servers that support it. This does require clients to know about the capabilities of the servers they target, but this is really not different from the client knowing that the server speaks the tus protocol to begin with.

Do you plan handle pause and resume?

As you and @kvz said, this feature is kind of implicitly supported by the protocol, and it's up to the application to decide how long to keep partial uploads for resumability. That being said, I'll add a section in the protocol FAQ about this.

I will go through all open discussions and get back to you on those.

Sure, there are a lot of great discussions on all of these topics, but not much in the way of working solutions yet.

IMO it will make most sense to wait for the 0.2 proposal which I'll push out ASAP as this will give us a better base for discussing individual features.

MarkMurphy commented 10 years ago

It's been over a year, where are things at on this?

Acconut commented 9 years ago

Since Range and Content-Range aren't used any more in benefit of Offset I would propose following very simple solution:

If the client wants to upload a file with unknown length it should leave the Content-Length header out of the request. The server then can decide whether to accept or reject it. In order to know in advance if the server accepts this behaviour it should check using the discovery mechanism (see #29).

Acconut commented 9 years ago

~~In order to tell the server when the upload is finished (since it is not aware of that when the file is created) the client should send one request with Content-Length. Here's an example:~~

POST /files
Host: master.tus.io
# No Content-Length

201 Created
Location: master.tus.io/files/5f4dcc3b5aa765d61d8327deb882cf99

PATCH /files/5f4dcc3b5aa765d61d8327deb882cf99
Host: master.tus.io
Offset: 0
# No Content-Length

[100 bytes of data]

PATCH /files/5f4dcc3b5aa765d61d8327deb882cf99
Host: master.tus.io
Offset: 200
Content-Length: 100

[100 bytes of data]

Now the server knows that the files will be 300 bytes (Offset + Content-Length from the last request). It currently only has bytes 0-100 and 200-300, thou.

Acconut commented 9 years ago

To correct myself using a single Content-Length header is not the best way. In order to tell the server that the client should add the Entity-Length header as seen in following example:

POST /files
Host: master.tus.io
# No Content-Length

201 Created
Location: master.tus.io/files/5f4dcc3b5aa765d61d8327deb882cf99

PATCH /files/5f4dcc3b5aa765d61d8327deb882cf99
Host: master.tus.io
Offset: 0
Content-Length: 100

[100 bytes of data]

PATCH /files/5f4dcc3b5aa765d61d8327deb882cf99
Host: master.tus.io
Offset: 200
Content-Length: 100
Entity-Length: 300 # Total length of uploaded file when finished

[100 bytes of data]

Now the server knows that the files will be 300 bytes (value of Entity-Length header). It currently only has bytes 0-100 and 200-300, thou.

Acconut commented 9 years ago

See #49 for the proposed changes.

qsorix commented 9 years ago

A new round of questions :)

Let's say a client started to upload an infinite stream, e.g. live video recording.

What should happen when at some point server decides it cannot accept more data? It could close the reading half-connection and send back an error code but I don't know if it follows HTTP framework. Some client libraries may have difficulties handling that. The other option is to simply close both ends of the connection. Then what should happen on subsequent HEAD requests on that resource? Should the server kill (i.e. remove) the upload? Can the client decide that since it cannot send anything more, it is OK to finish here and send an empty PATCH with Entity-Length equal to the current Offset?

Another one:

Can a streaming upload be used to provide stream oriented processing or should this be beyond the scope? In other words, can server start processing the incoming data while the upload is still unfinished? This can potentially save a lot of server's disk space and spread resource utilization over time. Should implementations be allowed to do this or can you think of any problems? I think it prevents checking the control sum but implementation may not care.

Acconut commented 9 years ago

What should happen when at some point server decides it cannot accept more data? It could close the reading half-connection and send back an error code but I don't know if it follows HTTP framework. Some client libraries may have difficulties handling that. The other option is to simply close both ends of the connection. Then what should happen on subsequent HEAD requests on that resource? Should the server kill (i.e. remove) the upload?

As long as the server hasn't sent any status code yet (or responded with a 100 Continue) it can send a 503 Service Unavailable else the only solution is to close the connection and wait until the client retries. This behaviour is defined in the retries extension. If the server removed the upload (recommended if it won't accept more input in the future) it should return with a 404 Not Found to HEAD and PATCH requests.

Can the client decide that since it cannot send anything more, it is OK to finish here and send an empty PATCH with Entity-Length equal to the current Offset?

It depends on the client's usage. If the resource is still available but can't be completed the client may do so.

Can a streaming upload be used to provide stream oriented processing or should this be beyond the scope? In other words, can server start processing the incoming data while the upload is still unfinished? This can potentially save a lot of server's disk space and spread resource utilization over time. Should implementations be allowed to do this or can you think of any problems? I think it prevents checking the control sum but implementation may not care.

In theory yes, but the protocols also allows offsets different then the current amount of uploaded bytes (see non-contiguous chunks and parallel uploads). So you can't rely on this currently since the protocol says that the offset may be smaller (I disagree on this by the way). That's also a problem when looking at streaming downloads (see #28).

Acconut commented 9 years ago

Implement in #49.