Open xich opened 11 years ago
Are all HTTP headers encode-able as
Text
?
Is this the same as asking if it's possible to construct a valid HTTP header from any two Text
values representing the name and the value? The answer appears to be no, because header names must be ASCII (0..127). The value can contain any data, but only when encoded in accordance with RFC 2047, and I don't know if HTTP clients can be expected to support it.
Here's the grammar for an HTTP header, from http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2:
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *( field-content | LWS )
field-content = <the OCTETs making up the field-value
and consisting of either *TEXT or combinations
of token, separators, and quoted-string>
And relevant definitions (all from http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.1):
OCTET = <any 8-bit sequence of data>
CHAR = <any US-ASCII character (octets 0 - 127)>
CR = <US-ASCII CR, carriage return (13)>
LF = <US-ASCII LF, linefeed (10)>
SP = <US-ASCII SP, space (32)>
HT = <US-ASCII HT, horizontal-tab (9)>
CRLF = CR LF
LWS = [CRLF] 1*( SP | HT )
token = 1*<any CHAR except CTLs or separators>
separators = "(" | ")" | "<" | ">" | "@"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
qdtext = <any TEXT except <">>
quoted-pair = "\" CHAR
The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14].
TEXT = <any OCTET except CTLs,
but including LWS>
A very abridged summary of RFC 2047: encoded text looks like this:
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
where 'encoding' is either 'Q' (Quoted-Printable) or 'B' (base64). Examples:
The following are examples of message headers containing 'encoded-
word's:
From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
CC: =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be>
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
=?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
Although that doesn't necessarily mean that setHeader
should take ByteStrings. Maybe the solution should just be to call T.encodeLatin1
on the arguments to setHeader instead of T.encodeUtf8
(which currently seems to be happening?)
Need to carefully examine where it is appropriate to use
ByteString
and where it is appropriate to useText
. For instance, headers currently returnText
values... but are all HTTP headers encode-able asText
?