zotonic / webzmachine

Zotonic's fork of Basho's webmachine
http://zotonic.com/
Apache License 2.0
11 stars 9 forks source link

Disentangle Content-Encoding, Transfer-Encoding, character set conversion and chunked transfers. #14

Open mworrell opened 10 years ago

mworrell commented 10 years ago

In Webmachine there is a bit confusion between Content-Encoding and Transfer-Encoding.

The Content-Encoding is applied by functions, though also on the chunks. This doesn't play well with gzip, as that should be applied to the whole entity and not to its chunks.

The idea behind Content-Encoding is that the server has multiple versions of an entity, and can select which version should be served. That means that the server has already a prepared compressed version of the data. The Range is also applied on this (compressed) version of the data.

The Transfer-Encoding is applied after the fetching the correct ranges and can consists of chunking or (further) compression. There can be multiple Transfer-Encodings, they are given in the header in the order that they were applied. For example, first chunking and then gzip will give:

Transfer-Encoding: chunked, gzip

This plays well with different content encodings.

In this way there will be a clear distinction: the controller provides the content and Webzmachine might add transfer encodings.

On a similar note, the character set is almost always UTF-8 and should be supplied by the content, i.e. not changed by Webzmachine.

I propose to add two new callbacks, remove one, and change another:

New:

Remove:

Change return format of:

All three will return a list of encodings/charsets, instead of tuples with the encoding and re-code functions.

The content_types_provided/2 function should then take the selected charset and encoding to select the correct content function (or the content function can handle that).

The selected encodings are available from:

mworrell commented 10 years ago

/cc @mmzeeman @arjan @kaos

mmzeeman commented 10 years ago

Nice one :-) Some parts of webmachine desperately need a cleanup. This will make things more clear in this area.

Side note: There is a separate header for negotiating which transfer encoding to use. This is the TE header. Browsers generally don't send this header. This means that in the normal case identity and chunked are allowed, not gzip. Even if that is specified in the Accept-Encoding header. The Accept-Encoding header is restricted to specifying content encoding only. Proxies may add an TE header of course, but I don't think that is done in practice.

kaos commented 10 years ago

:+1:

mworrell commented 10 years ago

The selected content-encoding/charset can be fetched with:

wrq:resp_content_encoding(ReqData).
wrq:resp_chosen_charset(ReqData).

There are some more new wrq functions:

resp_transfer_encoding/1, set_resp_transfer_encoding/2,
resp_content_encoding/1, set_resp_content_encoding/2,
resp_content_type/1, set_resp_content_type/2,
resp_chosen_charset/1, set_resp_chosen_charset/2,

See wrq.erl for more details.

Also added support for file:sendfile, and some new content return values, complete list is now:

iolist()
{device, IO}
{device, Length, IO}
{file, Filename}
{file Length, Filename}
{stream, StreamFun}
{stream, Size, StreamFun}
{writer, WriterFun}
mmzeeman commented 10 years ago

That looks nice. I will place the same controller workflow inside the new elli + machine solution.