jshttp / negotiator

An HTTP content negotiator for Node.js
MIT License
309 stars 33 forks source link

Accept-Language comparing standard currently differs between `getLanguagePriority` and `compareSpecs` #54

Open tangye1234 opened 5 years ago

tangye1234 commented 5 years ago

Let's make the request

Accept-Language: zh, zh-CN;q=0.9

and We provided backlist: ['zh-CN', 'zh-TW']

Coding:

const n = new require('negotiator')({ headers: { 'accept-language': 'zh, zh-CN;q=0.9' } })
n.language(['zh-CN', 'zh-TW'])

Actual: 'zh-TW'

Expect: 'zh-CN' for sure

Analysis: In the code, we get two comparing code snippets:

function getLanguagePriority(language, accepted, index) {
    var priority = {
        o: -1,
        q: 0,
        s: 0
    };

    for (var i = 0; i < accepted.length; i++) {
        var spec = specify(language, accepted[i], index);

        // this means `s`(the matching level) > `q`(the quality) > `o`(the index of accepted)
        if (spec && (priority.s - spec.s || priority.q - spec.q || priority.o - spec.o) < 0) {
            priority = spec;
        }
    }

    return priority;
}

But when codes come here

function compareSpecs(a, b) {
    // `q`(the quality) > `s`(the matching level) > `o`(the accepted index) > `i`(comparing index)
    return (b.q - a.q) || (b.s - a.s) || (a.o - b.o) || (a.i - b.i) || 0;
}

WHICH LEADS: when comparing zh-CN to accepted zh, zh-CN;q=0.9, we get a high matching level zh-CN with a lower quality weight 0.9

and then comparing zh-TW to accepted zh, zh-CN;q=0.9, we get the the only half-matched zh with a high quality weight 1

finally, we use compareSpecs to select a higher quality language zh-TW rather than the lower but 100% percent matched one zh-CN

Read the rfc4647, it is not defined whether this expectation should be reasonable. It only reads:

If the user cannot be sure which scheme is being used (or if more than one might be applied to a given request), the user SHOULD specify the most specific (largest number of subtags) range first and then supply shorter prefixes later in the list to ensure that filtering returns a complete set of tags.

So the accepted language with zh, zh-CN;q=0.9 is not comformed to this user decision. But I think, the comparing logic should be the same ( q > s > o), such that

n.languages(['zh-CN', 'zh-HK']) // returns ['zh-CN', 'zh-HK']
n.languages(['zh-HK', 'zh-CN']) // returns ['zh-HK', 'zh-CN']
julianlam commented 5 years ago

How coincidental, I just ran into this issue as well.

var Negotiator = require('negotiator');
var req = { headers: { 'accept-language': 'en-CA,en;q=0.9,en-GB;q=0.8,en-US;q=0.7,fr;q=0.6,pt;q=0.5,th;q=0.4' } }
var negotiator = new Negotiator(req);
negotiator.languages(['en-GB', 'en-US', 'en-x-pirate']);
[ 'en-x-pirate', 'en-GB', 'en-US' ]   // output

In my scenario the en should match en-GB first at priority of 0.8. en substring match with en-x-pirate should be considered only after all exact matches have been exhausted.

julianlam commented 5 years ago

@tangye1234 #46 provides a bit more context, and @dougwilson has shown that this module adheres to RFC 4647, so this unfortunately is a wontfix

tangye1234 commented 5 years ago

@julianlam, I think @dougwilson is trying to explain the case Apache mod_negotiation and jshttp/negotiator behave the same, but that is not what rfc4647 describes, the RFC only describes that in this case language cannot be selected out, but only be filtered with a fallback list, while users would decide for their own.

Maybe he is right because the negotiator cannot make decisions for the user, unless user send another accept-language list: en-GB, en;q=0.9 that follows the the suggestion from RFC:

the user SHOULD specify the most specific (largest number of subtags) range first and then supply shorter prefixes later in the list

dougwilson commented 5 years ago

Hi everyone, sorry I have been away with other things.

For Accept-Language header, this module specifically only implements the algo linked to from the HTTP RFC (https://tools.ietf.org/html/rfc7231#section-5.3.5). To quote from that section of RFC 7231:

For matching, Section 3 of [RFC4647] defines several matching schemes. Implementations can offer the most appropriate matching scheme for their requirements. The "Basic Filtering" scheme ([RFC4647], Section 3.3.1) is identical to the matching scheme that was previously defined for HTTP in Section 14.4 of [RFC2616].

The "Basic Filtering" algo is what this module currently implements for Accept-Language header, as it is the algo web servers have always traditionally implemented for selecting a language from a set list for HTTP applications. The full test of this algo is in RFC 4647 here: https://tools.ietf.org/html/rfc4647#section-3.3.1

How the basic filter algo works is also illustrated in a previous comment I made here https://github.com/jshttp/negotiator/pull/46#issuecomment-228209271 which applies to the original post as well.

I'm not really familiar with the ago in RFC 4647 that would produce the expected results in the original post. @tangye1234 would you be able to link directly to the section in RFC 4647 that defines the algo you're expected and than use your example to walk though it? I'm 100% open to adding additional defined algos to the list such that you can choose which one to use when matching language tags đź‘Ť the PR above was changing the existing standards-defined algo which is why it was marked as wontfix, though adding additional algos from the RFC as options to alter behavior is perfectly acceptable.

I hope this makes sense.

julianlam commented 5 years ago

Assuming we want to define additional algos to the call to negotiator, how would this be achieved, as a second argument for .languages?