Closed murataka closed 5 years ago
The spec allows that whitespace character to exist, there is nothing bad about it. Your last note is specifically about whitespace around the "=" character, but your PR is not modifying any whitespace next to a "=" character, rather you are removing the whitespace character that follows the ;
character, which you also quoted that that is valid.
both are ok. with or without whitespace.
the problem is , RFC does not force the whitespace or the case-sensitivity.
while parsing the headers , i am sure using a standard removes the headache in many cases.
text/html;charset=utf-8 text/html;charset=UTF-8 Text/HTML;Charset="utf-8" text/html; charset="utf-8"
Right, and this module does follow the standard. The space you removed is specified in the ABNF you pasted above:
media-type = type "/" subtype *( OWS ";" OWS parameter )
The OWS
token is right after the literal ";" in the specification. The OWS
token is defined as the following (https://tools.ietf.org/html/rfc7230#section-3.2.3):
OWS = *( SP / HTAB )
No parser following the standard will have any trouble parsing this, as it is in the standard. The SP
token is defined as the character 0x20
which is the standard whitespace character used in this moudle.
If you are wondering why this module is emitting a space that is optional, it's actually for improved compatibility. Basically any website you can think of that sends a parameter in it's content-type header will include that optional single space after the ";" character. Here is, for example, the header from the site I linked to the for RFC:
$ curl -sI https://tools.ietf.org | grep -i content-type:
Content-Type: text/html; charset=UTF-8
i think this is cause of "being used to" tokenizing for spaces in "C" language , strtok .
In this case , some old browsers used strtok for parsing in headers.
This seems not important, but when you are dealing with slow network , low power devices , standards must be strict, else you have to handle many things in such a small cpu/memory .
In this case , you have to make your own standards , "data should not come in that format" .
The standards must be strict for modular development to be possible. Otherwise is loss of time , power and resources.
No disagreement there at a high level, but the standards are what they are and this module is following them as laid out. It sounds like you may need to redirect your efforts to the actual standards bodies to alter the standards. For example, even as simple as getting the standard to add strong wording for particular serialization could help in that regard. But I am not a member of any standards body, so talk about chaining them is not going to go anywhere on this forum, at least.
RFC 7231 (https://tools.ietf.org/html/rfc7231) is the most up-to-date standard regarding the Content-Type
header. They have a forum on GitHub in fact at https://github.com/httpwg/http-core if you are more comfortable using GitHub to make contact.
3.1.1.1. Media Type
HTTP uses Internet media types [RFC2046] in the Content-Type (Section 3.1.1.5) and Accept (Section 5.3.2) header fields in order to provide open and extensible data typing and type negotiation. Media types define both a data format and various processing models: how to process that data in accordance with each context in which it is received.
The type/subtype MAY be followed by parameters in the form of name=value pairs.
Fielding & Reschke Standards Track [Page 8]
RFC 7231 HTTP/1.1 Semantics and Content June 2014
The type, subtype, and parameter name tokens are case-insensitive. Parameter values might or might not be case-sensitive, depending on the semantics of the parameter name. The presence or absence of a parameter might be significant to the processing of a media-type, depending on its definition within the media type registry.
A parameter value that matches the token production can be transmitted either as a token or within a quoted-string. The quoted and unquoted values are equivalent. For example, the following examples are all equivalent, but the first is preferred for consistency:
Internet media types ought to be registered with IANA according to the procedures defined in [BCP13].