jshttp / content-type

Create and parse HTTP Content-Type header
MIT License
131 stars 27 forks source link

remove "bad" whitespace ! #14

Closed murataka closed 5 years ago

murataka commented 5 years ago

3.1.1.1. Media Type

HTTP uses Internet media types [RFC2046] in the Content-Type (Section 3.1.1.5) and Accept (Section 5.3.2) header fields in order to provide open and extensible data typing and type negotiation. Media types define both a data format and various processing models: how to process that data in accordance with each context in which it is received.

 media-type = type "/" subtype *( OWS ";" OWS parameter )
 type       = token
 subtype    = token

The type/subtype MAY be followed by parameters in the form of name=value pairs.

 parameter      = token "=" ( token / quoted-string )

Fielding & Reschke Standards Track [Page 8]

RFC 7231 HTTP/1.1 Semantics and Content June 2014

The type, subtype, and parameter name tokens are case-insensitive. Parameter values might or might not be case-sensitive, depending on the semantics of the parameter name. The presence or absence of a parameter might be significant to the processing of a media-type, depending on its definition within the media type registry.

A parameter value that matches the token production can be transmitted either as a token or within a quoted-string. The quoted and unquoted values are equivalent. For example, the following examples are all equivalent, but the first is preferred for consistency:

 text/html;charset=utf-8
 text/html;charset=UTF-8
 Text/HTML;Charset="utf-8"
 text/html; charset="utf-8"

Internet media types ought to be registered with IANA according to the procedures defined in [BCP13].

  Note: Unlike some similar constructs in other header fields, media
  type parameters do not allow whitespace (even "bad" whitespace)
  around the "=" character.
dougwilson commented 5 years ago

The spec allows that whitespace character to exist, there is nothing bad about it. Your last note is specifically about whitespace around the "=" character, but your PR is not modifying any whitespace next to a "=" character, rather you are removing the whitespace character that follows the ; character, which you also quoted that that is valid.

murataka commented 5 years ago

both are ok. with or without whitespace.

the problem is , RFC does not force the whitespace or the case-sensitivity.

while parsing the headers , i am sure using a standard removes the headache in many cases.

text/html;charset=utf-8 text/html;charset=UTF-8 Text/HTML;Charset="utf-8" text/html; charset="utf-8"

dougwilson commented 5 years ago

Right, and this module does follow the standard. The space you removed is specified in the ABNF you pasted above:

 media-type = type "/" subtype *( OWS ";" OWS parameter )

The OWS token is right after the literal ";" in the specification. The OWS token is defined as the following (https://tools.ietf.org/html/rfc7230#section-3.2.3):

OWS            = *( SP / HTAB )

No parser following the standard will have any trouble parsing this, as it is in the standard. The SP token is defined as the character 0x20 which is the standard whitespace character used in this moudle.

dougwilson commented 5 years ago

If you are wondering why this module is emitting a space that is optional, it's actually for improved compatibility. Basically any website you can think of that sends a parameter in it's content-type header will include that optional single space after the ";" character. Here is, for example, the header from the site I linked to the for RFC:

$ curl -sI https://tools.ietf.org | grep -i content-type:
Content-Type: text/html; charset=UTF-8
murataka commented 5 years ago

i think this is cause of "being used to" tokenizing for spaces in "C" language , strtok .

In this case , some old browsers used strtok for parsing in headers.

This seems not important, but when you are dealing with slow network , low power devices , standards must be strict, else you have to handle many things in such a small cpu/memory .

In this case , you have to make your own standards , "data should not come in that format" .

The standards must be strict for modular development to be possible. Otherwise is loss of time , power and resources.

dougwilson commented 5 years ago

No disagreement there at a high level, but the standards are what they are and this module is following them as laid out. It sounds like you may need to redirect your efforts to the actual standards bodies to alter the standards. For example, even as simple as getting the standard to add strong wording for particular serialization could help in that regard. But I am not a member of any standards body, so talk about chaining them is not going to go anywhere on this forum, at least.

RFC 7231 (https://tools.ietf.org/html/rfc7231) is the most up-to-date standard regarding the Content-Type header. They have a forum on GitHub in fact at https://github.com/httpwg/http-core if you are more comfortable using GitHub to make contact.