Splitting on comma to parse the Accept header is wrong

nox commented 3 years ago

The "Accept" header field can be used by user agents to specify response media types that are acceptable. Accept header fields can be used to indicate that the request is specifically limited to a small set of desired types, as in the case of a request for an in-line image.
Accept = #( media-range [ accept-params ] )

media-range    = ( "*/*"
                 / ( type "/" "*" )
                 / ( type "/" subtype )
                 ) *( OWS ";" OWS parameter )
accept-params  = weight *( accept-ext )
accept-ext = OWS ";" OWS token [ "=" ( token / quoted-string ) ]

Note that accept-ext accepts a quoted-string after the equal sign, which means the values can include commas.

The parsing code for Accept splits the Accept header values on comma and then parses each segment with MediaTypeProposal::from_str.

That means that Accept: foo/bar; parisien="baguette, jambon, fromage" will parse incorrectly.

jbr commented 3 years ago

I'm not sure this is a good first issue, but there are some parsing utilities in https://github.com/http-rs/http-types/blob/main/src/parse_utils.rs for whomever takes this up. Any header that includes quoted-strings or tokens should be implemented with a parser instead of splitting

red15 commented 3 years ago

Would introducing another dependency be an option, in that case quoted-string seems like an ideal candidate to avoid duplicate work. No nested dependencies present and works both ways it seems.

jbr commented 3 years ago

What would the advantage of quoted-string be over the http-types parse-utils, which we wrote specifically for this?

red15 commented 3 years ago

What would the advantage of quoted-string be over the http-types parse-utils, which we wrote specifically for this?

Point taken, missed that implementation. There is still a bit of duplicated work being done in that code though, using the dependency would mean you can "outsource" that part. Outsourcing has its own benefits and downsides of course.

I've been looking at this issue but am a bit confounded with how the Accept / Header types are working, it's surprising to me the Header is just a Trait and not a real Struct. Guess it's easier to initialize a tuple instead of having to use an exact/imported Header struct type? It does also mean we're sidestepping some of the good practices that come from the Rust strict type system? It feels a bit of duck-typing to me?

That means that Accept: foo/bar; parisien="baguette, jambon, fromage" will parse incorrectly.

@nox is there another more real-life example you can provide for values of Accept ? I've not seen such header styles and can't really think of cases that aren't just documentation. Regardless this is actually quite a dangerous thing as we're handling/parsing possibly rogue/bad actor input.

jbr commented 3 years ago

@red15 I'm not sure how Header would work as a "real struct" given that each header, once parsed, contains different structured data. It's not duck typing at all, but the trait system at work. There's a library of structs that all conform to a uniform trait that provides affordances for them to parse data out of headers (the definition of which includes concrete structs like Headers, HeaderName, HeaderValues, and HeaderValue) and/or insert structured data into Headers. Importantly, Accept is itself a plain old rust struct: https://github.com/http-rs/http-types/blob/b3a9b9615936adc6727027a00c4b107c0a9ba875/src/content/accept.rs#L51-L54, which could contain any data that is appropriate for the specific header. Where do you see a tuple?

As far as using external libraries, I agree we could have used an external parsing library, but the decision was made to keep parsing logic internal to http-rs crates. I can't fully explain the motivations behind that decision, but it was why I wrote the parse_utils parsers from the http spec for the Forwarded header implementation, which I believed was especially important to do correctly for similar security-related reasons to those you cite.

red15 commented 3 years ago

I'm mostly referring to the tests that use the Trait to extract header_name and header_value. https://github.com/http-rs/http-types/blob/85409417e57e9f61b0d6b40d081bdbaa8f385d07/src/headers/header.rs#L49-L51

Isn't this where the problem with quoted-string parsing would occur?

glasser commented 2 years ago

I asked nicely and now this library has a parser for this you can use: https://github.com/picoHz/mediatype/issues/4 https://docs.rs/mediatype/latest/mediatype/struct.MediaTypeList.html

red15 commented 2 years ago

Found an example of a simple test that should be added to demonstrate this problem:

https://github.com/http-rs/http-types/blob/1fe07df28652502696957e03f7377888c6ddb65a/src/mime/parse.rs#L467-L469

Simply adding the line

   // Handle multiple mime with comma and qualifiers
   assert_parse("text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c", "text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c");

would then fail the test with:

---- mime::parse::whatwag_tests stdout ----
thread 'mime::parse::whatwag_tests' panicked at 'assertion failed: `(left == right)`
  left: `"text/plain;q=\"0.5, text/html, text/x-dvi\""`,
 right: `"text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c"`', src/mime/parse.rs:315:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

That should demonstrate the parsing of the mime type is not properly done, unless my test needs tweaking but the "left" value of this Mime type really doesn't look like what we've inputted to me?

As a side node, perhaps it's worth taking off the tag good first issue as well :)

http-rs / http-types

Splitting on comma to parse the Accept header is wrong #349