jshttp / negotiator

An HTTP content negotiator for Node.js
MIT License
315 stars 36 forks source link

Parameter matching too strict #35

Open gr0uch opened 9 years ago

gr0uch commented 9 years ago

Suppose I have an Accept header that looks like:

Accept: application/vnd.api+json; ext=bulk

The output of new Negotiator(request).mediaType() would be application/vnd.api+json as expected. But when I specify an array of media types:

var negotiator = new Negotiator(request);
negotiator.mediaType(['application/vnd.api+json']); // returns `undefined`
negotiator.mediaType(['application/vnd.api+json; ext=bulk,patch']); // returns `undefined`
negotiator.mediaType(['application/vnd.api+json; ext=patch,bulk']); // returns `undefined`
negotiator.mediaType(['application/vnd.api+json; ext=bulk; supported-ext=bulk']); // returns `undefined`
negotiator.mediaType(['application/vnd.api+json; ext=bulk']); // returns `application/vnd.api+json; ext=bulk`

Which is kind of WTF because it only matches the exact parameters and I get the parameters in the media type as well. Is this expected behavior? I can see that it may be useful in extracting parameters but I expect that it should match the media type without the parameters, otherwise permutations of parameters may get huge. Possible dupe of #30.

dougwilson commented 9 years ago

Nah, this isn't a dup of that. I wasn't aware of this problem and it makes me sad :( I need to think on this a little bit on what the actual correct behavior should be, because whatever it it is, I don't believe the current behavior seems correct.

ethanresnick commented 9 years ago

So, I've been looking into this, and the specs are pretty ambiguous, but my current reading is that negotiator is largely doing the right thing here. (Or, at least, it's largely doing the most correct thing that it can do without considering the particular media type.)

My primary evidence is this example from RFC 7231:

  Accept: text/*;q=0.3, text/html;q=0.7, text/html;level=1,
             text/html;level=2;q=0.4, */*;q=0.5

   would cause the following [quality] values to be associated:

   +-------------------+---------------+
   | Media Type        | Quality Value |
   +-------------------+---------------+
   | text/html;level=1 | 1             |
   | text/html         | 0.7           |
   | text/plain        | 0.3           |
   | image/jpeg        | 0.5           |
   | text/html;level=2 | 0.4           |
   | text/html;level=3 | 0.7           |
   +-------------------+---------------+

In that example, the server knows how to produce a representation in text/html;level=3, and the client's preference for that format is assumed to be "inherited from" its preference for text/html; that's why they both have a quality value of 0.7, despite the client never explicitly declaring its degree of preference for text/html;level=3.

Combined with the spec's specificity-based ordering scheme, I take the above to mean that the spec sees the difference between "type/*" and "type/subtype" as analogous to the difference between "type/subtype" and "type/subtype;parameter=value". That is, I think that the generic way to think about adding a parameter is that it picks out a more specific format that's compatible with (i.e. a subclass of) the same media range without the parameter.

Now, as a caveat, that there's also this line in RFC 7231: “The presence or absence of a parameter might be significant to the processing of a media-type, depending on its definition within the media type registry.” (From RFC 7231)

That could be read as invalidating the above reasoning and delegating the meaning of parameters to each media type. But it's also incredibly vague and, combined with the above example, I'm not sure it has to be read that way. Moreover, I think there's a good argument that to be made that, even if that line ultimately gives each media type control over parameter semantics, a good default for a library like this would be to read each parameter as further narrowing down the format.

If this line of argument is accepted, then, for the header Accept: application/vnd.api+json; ext=bulk, the expected results would be:

  1. Undefined for:
    • negotiator.mediaType(['application/vnd.api+json']);
    • negotiator.mediaType(['application/vnd.api+json; ext=bulk,patch']);
    • negotiator.mediaType(['application/vnd.api+json; ext=patch,bulk']);
  2. And "application/vnd.api+json; ext=bulk" for:
    • negotiator.mediaType(['application/vnd.api+json; ext=bulk; supported-ext=bulk']);
    • negotiator.mediaType(['application/vnd.api+json; ext=bulk'])

This is the same as current behavior, except for the input "application/vnd.api+json; ext=bulk; supported-ext=bulk", which currently returns undefined.

ethanresnick commented 9 years ago

Also, maybe @reschke or @dret have insight into what the correct behavior should be here, as the above interpretation is just my relatively-uninformed exegesis of the spec.

(For clarity, the negotiator.mediaType(availableTypes) function is just meant to return the best matching media type from availableTypes, using the client's Accept header.)

royfielding commented 9 years ago

The spec isn't ambiguous, aside from the fact that all content negotiation is optional. If the matching algorithm is implemented, then most-specific match wins. It is assumed that a server capable of meaningful negotiation based on a media type parameter is also going to know what the parameter means (or provide a ranked configuration interface for the resource owner to specify how to handle them). Naturally, this might be media-type specific and is certainly server implementation-specific.

In any case, commas are not allowed inside of parameter values unless the value is enclosed in DQUOTEs, so several of those examples are syntactically invalid. Also, it is a waste of time to negotiate on the basis of obscure parameters while using a meaningless media type like vnd.api+json. Define separate media types (preferably with meaning) if you think the differences will matter to a client. You will note that the examples in the spec are artificial, since there isn't any real-world need to negotiate on parameter names.

ethanresnick commented 9 years ago

Thanks @royfielding for jumping in, both here and on the other issue! It's much appreciated.

Just to clarify one thing: when you say "If the matching algorithm is implemented, then most-specific match wins", the matching algorithm that you're talking about is, at least where parameters are concerned, not constrained at all by the HTTP spec? That's certainly what the rest of your comment seems to imply, but I just want to double check. Because the algorithm for type/subtype matching is, of course, entirely prescribed by the spec (though, like you said, actually honoring a match is optional), and my comment earlier was trying to read some implicit logic/constraints into what the spec authors might have imagined for parameter matching. But it sounds like your saying that those constraints just aren't there.

If that's the case, then a simple one word/sentence reply is all I need. Again, I appreciate the time you've given thus far, and don't want to take much more of it!

dantman commented 7 years ago

Parameters are going to be largely type specific, what about including an alternate callback form that returns the full media type instead of the input string; allowing the callback to restrict valid parameters and the caller to make use of the parameters.

For example, using an accepts sample and a vendor specific media type with a version parameter:

accepts((mt) => mt.type === 'application' && mt.subtype == 'vnd.fubar' && parseInt(mt.params.v||0) < 5);
// 'application/vnd.fubar; v=3' => 'application/vnd.fubar; v=3'
// 'application/vnd.fubar; v=9' => undefined

One could then just use media-type to get the actual value of v.

dougwilson commented 7 years ago

I like the idea of passing a function that would say "yay or nay" for each type as an alternative to passing in an array of types. That seems to be nicely flexible.

royfielding commented 7 years ago

Sorry I missed the clarifying question from @ethanresnick

Yes, the parameters are intended to narrow the media type (they are not supposed to define distinct types), hence the HTTP algorithm assumes that media types with parameters have the same qvalue as the same media type without parameters unless a specific qvalue is assigned to those parameters.

Note, however, that negotiation by parameter value is easily one of the least exercised features of HTTP, so implementations will vary. In general, my advice to most folks is to avoid Accept-based (proactive) negotiation like the privacy/performance plague that it is, and instead design services for reactive negotiation (having the UA choose client-side, using meaningful links or client-side scripting, from multiple options represented by the server).