Define behavior for GET on incomplete uploads

biasedbit commented 11 years ago

Edge case. Don't really have a strong opinion on whether it should simply return 404 Not Found or give some sort of indication that an upload is under way, like 416 Requested Range Not Satisfiable. For the latter, the RFC says servers "SHOULD" respond with 416 if a Range header was sent in the request but it doesn't say anything about it not being allowed or recommended otherwise.

Probably best to just leave a small note on the protocol draft stating that default behavior is to report 404 until the upload has been completed but server implementations are free to roll in their own behavior as they deem fit.

sandfox commented 11 years ago

404 seems inappropriate because there really is a resource there.

e.g If I do a POST to /files/my and get returned a link to the resource /files/12345 then that resource does exist and you can make HEAD requests etc, it's just the server (probably) can't return a suitable representation if a GET request is made to that resource.

Off my head 406 - Not Acceptable could, in some situations, be suitable but it seems a little far fetched for the general use case (I need to read the spec some more)

405 - Method Not Allowed could be sensible if we go down the simple route of saying a resource is not GET-able until completely uploaded, but this feels a little blunt and inelegant.

409 - Conflict is my current favourite as long as we all agree making the request represents a client side failure of some kind and not an issue with the server..

biasedbit commented 11 years ago

I see 409 Conflict as misleading in this case. To me it conveys the idea that you're trying to put something on the server that is somehow not acceptable given the current content. By the RFC it's perfectly legal to return that on a GET; it just feels wrong to me (stress on feel and me).

Going through the list of 4xx error responses, even if we consider some additions like the WebDAV 4xx codes, I can't really identify one that strikes me as being perfect for this case.

404 seems inappropriate because there really is a resource there.

Given it's not accessible, would a 403 work?

I'm just trying to go for a sensible default, hence the 404; it's probably the simplest — e.g.: query the database for file with id xkcd and complete flag set to true, return 404 if result is nil. Each server will always be able to implement its own behavior.

sandfox commented 11 years ago

As you mentioned, 403 - Forbidden seems the most appropriate response here if we only allow GETs for the entire file to be successful for completed uploads. It allows for a response body where we can delegate to individual implementations how much information/reasoning they give.

Something that needs consideration is the situations where a GET request is made for a subsection of the entire file (using range or content-range headers for example) which is a very common scenario (streaming videos, resuming large file downloads)

404ing on getting the entire file would imply that range requests would also fail, where in actual fact some of them could succed. If the range requested is not completely uploaded we can return a 416 - Requested Range Not Satisfiable

I suppose this could be left as option for implementers, if you want to allow partial downloads - support 403 or whatever. If you don't then just supporting a 404 is fine (although it seems lame)

j4james commented 11 years ago

As I mentioned on Hacker News, I like 416 Requested Range Not Satisfiable for both HEAD and GET, because technically the request could be satisfied if a range was specified that only included segments of the file that had already been uploaded.

While returning 416 in response to a request without a Range isn't defined, it's not a huge stretch to think of such a request as including an implied Range of everything (i.e. something like bytes=0-).

The only problem is how to let the client know what ranges are already available. Content-Range would work if you only allowed partial uploads to occurr sequentially, but if you want to support parallel uploads of different parts of the file at the same time, you're going to need a new header.

sandfox commented 11 years ago

This rabbit hole keeps getting deeper and deeper...

To support returning multiple non-sequential byte ranges the server would need to support returning something like multipart message but after reading some of this http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16 it appears thats only valid if the client explicitly asks for multipart byte/part response and I can't see another mechanism to return multiple byte ranges without the client asking for them.

I'm almost tempted to say 404 and stuff it, but thats a cop out. 400 feels semantically most correct because whilst making a HEAD request is valid, making a GET request is not until the full contents of the file is uploaded but the spec's state it should only be sent in response to 'malformed syntax' which is not correct. This leaves 403 as my current winner unless the server can't be bothered to give a useful message, then 404 is fine.

416 feels like it's stretching the symantics too much and is a little non-obvious behaviour.

I'd really like another new status code for this sort of situation. i.e resource not ready or something similar.

kevinswiber commented 11 years ago

I think a 206 Partial Content is appropriate here, potentially with a multipart/byteranges Content-Type.

Reference: http://tools.ietf.org/html/draft-ietf-httpbis-p5-range-22#section-4.1

kevinswiber commented 11 years ago

In the case of a file that has been created with no bytes yet transferred, a 204 No Content might be appropriate, assuming the file upload HTTP calls might complete at a later point in time.

biasedbit commented 11 years ago

400 is misleading. Conveys the idea that the client has something wrong in his request, which is not true.

Consider a regular non-resuming upload: 404 is perfectly valid — if the upload completed, it'll be there, if it didn't, it won't. In that light, I don't feel like 404 is lame or a cop out. GET will typically be used by a reader role (HEAD, POST and PUT from a creator role); for a reader, the file does not exist until it is ready for consumption, i.e. complete.

I stand by the 404. It allows by far the simplest server implementation — which was the whole point of this discussion. 403's also good.

I'd really like another new status code for this sort of situation. i.e resource not ready or something similar.

There's always the option of using one of the above with a different (non-standard) status message...

@kevinswiber on the 206, from the RFC:

The request MUST have included a Range header field (section 14.35) indicating the desired range, and MAY have included an If-Range header field (section 14.27) to make the request conditional.

Going against a SHOULD may be acceptable but we definitely shouldn't go against MUST directives.

kevinswiber commented 11 years ago

@brunodecarvalho Ah, you're absolutely right. For GET requests with a Range header... and a file that's only partially uploaded, a 206 is appropriate for a valid range.

With an invalid Range header on a GET, the right response is 416 Range Not Satisfiable.

I think the rules of 403 Forbidden apply here when there is no Range header on a GET request and a file is only partially uploaded. The rule states that servers can show a 403 if they want the client to know the request is being actively refused; otherwise show a 404.

sandfox commented 11 years ago

@kevinswiber @brunodecarvalho

I agree that 400 is too brute force and implying the wrong thing.

I'm quite happy with 403 or falling back to 404 depending on implementation, but i'd like to just check that we aren't chucking out 405 - Method Not Allowed without good consideration.

One of the cons with 405 is it's uncertain how permanent that response is treated as being. i.e. should you expect a 405 response for a resource to always be 405, or is it like 404 or 5xx whereby at some point in the future the resource will be retrievable. As I write this it feels like it could be interchangeable with 403 as there maybe times when the server doesn't want to return any other acceptable methods.

Thoughts?

kevinswiber commented 11 years ago

@sandfox

Re: 405 - I think the method is allowed, but the correct server state doesn't exist to serve the response.

Reading the spec again, it seems 403 is not the right way to go. RFC 2616 explicitly states that a client making a request which returns a 403 response SHOULD NOT attempt that request again.

Unfortunately, there's no 2xx Pending status code. (This would be great for APIs, too.)

In the Web API world, we would likely return a 303 See Other that points to a representation communicating the pending status. Alternatively, API authors might respond with a 200 and explain a "pending" status in the response body.

Thinking deeper, I believe my latest recommendation would be a 503 Service Unavailable. With 503, a Retry-After header can be included. I think this is exactly what we want. "The server can't serve this to you at the moment, but try again soon."

sandfox commented 11 years ago

@kevinswiber the only downside to 503 is that somewhow suggests that it is at fault which isn't really true.

The closest thing to a Pending status is 202 - Accepted which I think you could argue would be a valid but a complete abuse as it would cause no end of confusion for people not experienced with the protocol, end users (joe bloggs on his pc) and many software clients that would think it meant everything has really worked when it really hasn't.

I'm swinging behind a 404 for generic GET requests and allowing for servers to optionally support GET requests with range headers and whatnot along with appropriate response (as mentioned above). My argument being that if you have knowledge of the protocol, you know you can make a HEAD request without needing to be told this by the server, If you don't know (because you a user trying to view a picture or whatever) then in the absence of a response saying come back later (which can be put in a 404 anyway) you only care about knowing it's there in a complete state, if it's not it may as well be completely non-existent to you.

TL:DR

something like this

MUST return a 404 for resources that aren't fully uploaded, OPTIONAL/SHOULD may return partial sections (or 416 if appropriate) for clients making requests with content-range headers

thoughts?

kevinswiber commented 11 years ago

@sandfox

I agree with your comments on 503.

The only issue is using the protocol to communicate the resource is expected to be available later. A Retry-After header is considered acceptable on a 503 response and is the only way to hack in that protocol-level communication (with the current spec, of course).

Using a 404, the protocol does not communicate that the resource will be available soon, so that information should be included in the response body. That might be the best compromise.

Some other thoughts...

WebDAV (of which I am no expert) seems to have a Status-URI header. That seems like it would be pretty convenient for clients to check the status of the file before requesting it again. I believe it's often used in a response of 102 Processing which seems an awful lot like 202 Accepted, but I believe the semantics limit it to long-running state change requests (e.g., MOVE, COPY).

I've looked for a 3xx status code that temporarily redirects to a status of the resource, but I'm not sure any of the existing status codes really fit. This would be ideal, in my opinion, and would communicate, "This resource is not yet available. You are being redirected to a URI that communicates more information regarding this resource's availability. When the resource is ready to be retrieved, subsequent requests to this URI will return the actual resource." I've needed this myself more than once.

It's almost unbelievable that this is such a hard nut to crack. Ah, well. I still :heart: HTTP. ;)

sandfox commented 11 years ago

@kevinswiber Only downer with retry-after is the server has no way to know when it should expect the resource to be ready because it's relying on external agents that have an indeterminate amount of time within which it may complete uploading the resource.

With webdav I feel alot of clients aren't going to understand it's semantics very well (either in terms of software, or humans). (Also - webdav feels like an attempt to hack/smash OS filesystem calls into HTTP)

I agree, another 3xx like you describe would be super useful.

HTTPs vagueness is it's downfall sometimes...

sandfox commented 11 years ago

I'm having an about turn on this completely.

In the interest of simplicity can we just leave this up the implementation to do whatever they feel like? It's not really coreto the problem of file uploading and isn't going to affect the the ability to upload files at all. The more opinionated we get about things that aren't essential the harder/less likely we make it for others to implement the protocol.

If people really want this sort of thing, it could go into an extension of the protocol, in fact there could many extensions depending on desired behaviour and problem domain.

Following on from this, is it worth sticking something in the protocol to explicitly state that the behaviour is left undefined? (I'm probably going to smash up a quick PR right now thinking about it)

Acconut commented 9 years ago

In the interest of simplicity can we just leave this up the implementation to do whatever they feel like? It's not really coreto the problem of file uploading and isn't going to affect the the ability to upload files at all.

I agree with @sandfox. tus is a protocol to upload and not to download files and this behaviour should be defined for each implementation.

tus / tus-resumable-upload-protocol

Define behavior for GET on incomplete uploads #13