Open declanvk opened 7 months ago
It's true that the is_token
method differs from the spec, which was motivated by real world traffic. However, the parts you're showing, header values, are not defined to be tokens even in the spec. Header values are defined as:
field-value = *field-content
field-content = field-vchar
[ 1*( SP / HTAB / field-vchar ) field-vchar ]
field-vchar = VCHAR / obs-text
obs-text = %x80-FF
Which allows [
and plenty more in header values.
Thanks for clarifying! When I re-read, I think then only I see it used in a couple other places like methods, parameter names, connection options, etc.field-names
are defined as tokens: https://www.rfc-editor.org/rfc/rfc9110#section-5.1-2?
However, looking in https://github.com/seanmonstar/httparse/blob/0f5e6fb0aa3a060146c6b4e9c9f33eec552297c0/src/lib.rs#L58, I only see the is_token
used to parse tokens and something with parsing URIs.
RFC 9110 Section 5.6.2 defines the grammar for field value tokens as
However in practice it seems that this specific grammar is relaxed to a larger set of allowed characters. I made a small test which checks this behavior:
I also checked how Firefox handles this header with a small program
Then visited
localhost:4040
in Firefox. The recorded execution also showed the delimiter in the token value:Based on the test assertions and the firefox screenshot it seems consistent that both programs allow characters outside the grammar specified in RFC91100.