tus / tus-resumable-upload-protocol

Open Protocol for Resumable File Uploads
https://tus.io
MIT License
1.48k stars 103 forks source link

Content-Type / Content-Encoding / Content-Language (HTTP Representation Metadata) #158

Open nigoroll opened 4 years ago

nigoroll commented 4 years ago

For HTTP responses, Content-Type is de-facto required to avoid user-agents having to guess mime-types (the RFC says SHOULD be generated). Content-Encoding MUST be generated IFF an Encoding (like gzip or brotli) is used and Content-Language is optional.

So the question arises how TUS handles this standardized metadata. My personal expectation would be that a tus server implementation should, by default, store the metadata for use of the component delivering it, but...

Acconut commented 4 years ago

Content-Type, unfortunately, MUST be set to application/offset+octet-stream as per the protocol definition

For good reason, in my mind. If I send a PATCH request with the second half of a PNG image, the content type of that request is not image/png since it's not a valid PNG image. The only case where the content type could be image/png would be if the client intends to upload the entire file in a single PATCH request. In the other cases, we would have to use a different content type. And since I don't think it's a good idea to have rules with such exceptions, we settled on using upload metadata for transferring the file type. Is that relatable?

Alternatively, I think that TUS should at least define the Upload-Metadata key content-type to specify the actual content-type.

Good idea. tusd and Uppy.js already use the filename and filetype metadata values for filling in the Content-Type header for GET responses. It would be good to add them as a recommendation to the specification.

nigoroll commented 4 years ago

re @Acconut

HTTP does not use that strict a definition: https://httpwg.org/specs/rfc7231.html#header.content-type

The "Content-Type" header field indicates the media type of the associated representation: either the representation enclosed in the message payload or the selected representation

and then in https://httpwg.org/specs/rfc7231.html#representations

For the purposes of HTTP, a "representation" is information that is intended to reflect a past, current, or desired state of a given resource

TUS PATCH requests can be seen analogous to 206 responses to Range requests where the Content-Type still refers to the entirety of the requested object and not just the particular response body.

So while I understand how this came about, I still think that changing the use of Content-Type would be more in line with HTTP.

That said, even if you decided to not change it for a future protocol version, I agree that some metadata values should be reserved for these semantics.

Acconut commented 4 years ago

HTTP does not use that strict a definition

That's interesting, good to know. Thanks for bringing it up!

even if you decided to not change it for a future protocol version

Yes, I would like to do that. tus clients already have to specify custom headers, so adding the upload metadata header is not a problem in my mind.

I agree that some metadata values should be reserved for these semantics.

Absolutely, we should definitely do that!