jshttp / negotiator

An HTTP content negotiator for Node.js
MIT License
309 stars 33 forks source link

Accept-Language not interpreted correctly #46

Closed marr closed 8 years ago

marr commented 8 years ago

I just created a pr that illustrates what I see when trying to find the language from the accepts-language header. According to the docs, languages should return the best fit, and I am seeing unexpected behavior when I send in headers with values such as en,en-US;q=0.8 which is standard US browser header value in Chrome.

If I call language(['en-US', 'en-GB']) I would expect the result to be en-US as that is an acceptable language that the server supports. The result is instead en-GB, leading me to believe there is a bug.

dougwilson commented 8 years ago

I agree it looks like a bug :) I'll check out that this matched the specs and then try to get a fix out very soon :)

dougwilson commented 8 years ago

Basically, either that header is just non-sense (aka it seems like you are expecting that both en-US and en-GB are treated as the quality of 1), or that we are correct in downgrading the priority of en-US over en-GB (since en-US is explicitly set to the quality of 0.8, while all en is the quality of 1). I will be trying to find out of the specifications for this header define what to do in such situations and/or reference existing implementations in things like Apache/nginx to see how they react to this if there is no specification for it.

dougwilson commented 8 years ago

I'm' very sorry, but this is not a bug in this module. The specification does describe the matching to use, and we do follow it in this case. (RFC 7231, which links to the algos in RFC 4647). Here is how it breaks down in your example:

Given by the user agent: en,en-US;q=0.8 Match against: en-US, en-GB

The algorithm describes matching against the ranges in order, from the most specific to the least specific (which means en-US is more specific than en, in this case). This means the example will run as follows:

  1. en-US does not exactly match en, move to next.
  2. en-US does specifically match en-US, so en-US is rated at the desired quality of 0.8.
  3. en-GB does not exactly match en, move to next.
  4. en-GB does not exactly match en-US, move to next.
  5. list exhausted, but run a fallback up to the less specific en.
  6. en does specifically match en, so en-GB is rated at the desired quality of 1.
  7. No more languages to match, so return your listed sorted by quality, descending: en-GB at 1 and then en-US at 0.8

The Apache mod_negotiation module also matches our behavior, which is the behavior described in the standards.

marr commented 8 years ago

@dougwilson thank you for looking into it. Can you explain that:

en does specifically match en, so en-GB is rated at the desired quality of 1.

I don't see why a match to en should rate en-GB at quality 1

dougwilson commented 8 years ago

I'm not sure how I can explain it differently from what UI wrote above. Is the confusion at what step 5 does?

marr commented 8 years ago

I'd expect the "supported" languages on the server [en-US, en-GB] to be matched against the accepts header, and given there is en-US in the accepts header that it would be preferred.

dougwilson commented 8 years ago

I understand that your expectation is different from what is set out in the specifications set forth to perform the matching in RFC 4647. Unfortunately you have stumbled into a module that has been created yo match the standards, rather than cater to everyone's different expeditions. I suggest perhaps using a different module that more closely aligns to your expectations and deviates from the standards.

marr commented 8 years ago

Thanks for your quick answers and investigation @dougwilson