willdurand / Negotiation

Content Negotiation tools for PHP.
https://williamdurand.fr/Negotiation/
MIT License
1.41k stars 62 forks source link

Content type application/json selected incorrectly using Microsoft Edge headers #98

Open emrysal opened 6 years ago

emrysal commented 6 years ago

Hi all,

This was a interesting case encountered using the Microsoft Edge (41.16299.15.0) browser, which sends the following headers by default.

Accept: text/html, application/xhtml+xml, image/jxr, */*

When the priorities are prioritised with application/json, text/html; then the Content Negotiation will prefer the application/json response type; this should clearly prefer text/html based on the headers send by Edge.

I worked around the issue by changing the priority from application/json, text/html => text/html, application/json. But I thought it a good idea to flag this issue.

I've written some tests which expose this behaviour, the following is the (invalid) 'passing' test.

array(
    array(new Accept('text/html'), new Accept('application/xhtml+xml'), new Accept('image/jxr'), new Accept('*/*')),
    array(new Accept('application/json'), new Accept('text/html')),
    array(
        new Match(1.0, 0, 0),   // application/json
        new Match(1.0, 110, 1), // text/html
        new Match(1.0, 0, 1),   // */*
    )
)

This seems to be caused by the index, but I am not exactly sure what the correct fix would be, hense the lack of a PR. My best guess is that application/xhtml+xml should not match application/json in any way, as they are radically different.

willdurand commented 6 years ago

You have */* in the header and all types have the same priority, so the algorithm picks the first in the list of priorities because that's the most preferred one.

This does not seem invalid to me.

lord2800 commented 6 years ago

IMO the more specific exact match to text/html should be preferred over the generic catch-all match of */*.

willdurand commented 6 years ago

Indeed, could be and now I remember this: https://github.com/willdurand/Negotiation/issues/83#issuecomment-272903893

emrysal commented 6 years ago

Ah I see now that it is indeed the */* part that is taking precedence over the text/html. Similar issue to the comment you mentioned @willdurand. It seems like Internet Explorer and Edge both do not implement ?q parameters, resulting in a preferred match the higher priority of application/json on the */*, instead of the direct match of lower indexed text/html on text/html.

I'm now actually confused what content should be negotiated, as */* matches the highest priority, but there is a direct match on a lesser priority. I agree this is not invalid, but as it is behaviour that is debatable, let's debate. Microsoft should really just implement */*?q=0.8 in the first place, but as they don't..

DISCLAIMER: The following is my opinion to get the ball rolling, feedback highly appreciated:

First off, automatically lowering the */* when it is the last index in the Accept header is a fix for this specific issue, but it leaves cases where the Accept header is for example text/html application/* */*?q=0.8;. I believe a modification in the algorithm taking into account the Accept priority of the requester over the priority order of the application - In the case there is uncertainty indicated by the wildcard - could be the appropriate solution here.

E.g.

Accept: text/html application/* */* Priority: application/json text/html Result: text/html

Accept: text/html application/json */* Priority: application/json text/html Result: application/json

Accept: application/* text/html Priority: application/json text/html Result: application/json

Accept: application/* text/html Priority: text/html application/json Result: text/html

Zegnat commented 6 years ago

I ran into this too and have been thinking about ways to solve it. I see no way of solving this that still sticks to RFC 7231, which feels iffy to me. My gut feeling is that Microsoft is just doing it wrong. But I thought I’d still share my musings.

I am of the opinion that the contents of the Accept header should be treated as an unordered set, so no value should be implied by the order of its elements. There is no “accept priority” possible in an unordered set.

According to the RFC, “the most specific reference has precedence” when assigning quality values. So clearly specificity is a concept that is still in effect even in our unordered set. If we can accept specificity to have bearing on preference, it becomes possible to apply bonus quality to the more specific media ranges. For those who know about CSS this is a familiar topic.

Quality values are in the range 0–1 with at most 3 decimal places used. For my tweaking of the algorithm I can use a fourth decimal place. This will allow for a media range to be preferred over another without accidentally making it compete with one that was specified with a higher value.

  1. Add 0.0000 to */* media ranges.
  2. Add 0.0001 to x/* media ranges.
  3. Add 0.0002 to x/x media ranges.

The IE Accept header will be interpreted as:

Accept: text/html;q=1.0002, application/xhtml+xml;q=1.0002, image/jxr;q=1.0002, */*;q=1.0000

This means text/html has a higher quality value and can be preferred during Negotiator::getBest().

Of course this will come with its own set of corner cases. Here is the outcome of following these steps on @emrysal’s examples:

Header Priorities Best
text/html, application/*, */* ['application/json', 'text/html'] text/html
text/html, application/json, */* ['application/json', 'text/html'] application/json
application/* text/html ['application/json', 'text/html'] text/html
application/* text/html ['text/html', 'application/json'] text/html
soyuka commented 3 years ago

@willdurand would you accept a patch with @Zegnat proposition?

willdurand commented 2 years ago

@willdurand would you accept a patch with @Zegnat proposition?

I guess? As long as there are new test cases and that existing ones don't break (OR it is very clear why they need to be updated), I am fine with pretty much everything.