jshttp / negotiator

An HTTP content negotiator for Node.js
MIT License
320 stars 36 forks source link

Matching parameters #30

Closed moll closed 10 years ago

moll commented 10 years ago

Hey,

I'm trying to use the Accepts module via Express (req.accepts) and thereby Negotiator to match some media types. They're in the form of application/vnd.foo.bar; v=1.

I'd like to cover the use cases of where the v is missing (to default to the latest version) and where the optional charset parameter is given. I do see some wildcard matching in lib/mediaType.js, but that only works from the requester side (HTTP Accept header in my case).

How would one do this? If it's necessary to enumerate all permutations of parameters, that sounds like an awfully lot of parsing work to be done for every request.

Thanks!

moll commented 10 years ago

Forgot to add that I'd also like to support an optional suffix, making a total of application/vnd.foo.bar+json; v=1.

dougwilson commented 10 years ago

So when doing negotiation, you are providing an ordered list. You should be providing a list like

switch (req.accepts('application/vnd.foo.bar; v=2', 'application/vnd.foo.bar; v=1', 'application/vnd.foo.bar')) {
}

Charset negotiation is not part of the Accept header, but the Accept-Charset header, which is accessible with req.acceptsCharsets() in Express.

Would you be able to share a more real example of what you're trying to do? I want to make sure it actually fits into how negotiation is specified in RFC 7231 and I'd be happy to work out a solution, but I just need a more real example to work with.

moll commented 10 years ago

Hey, thanks for the prompt reply.

There is indeed Accept-Charset. I was thinking of letting the charset parameter in, however, for consistency with what the server occasionally might return. E.g. application/vnd.foo.bar; v=1; charset=utf-8. I'm undecided on whether to disallowing any other unspecified parameter.

A few valid examples I'd like to match:

A few invalid examples:

Having enumerated all of these examples I'm thinking it might be easier to filter the given Accept header manually and then just sort the passing types by their q parameter. Coming up with a declarative API for that might actually be more complex than a filter function. What do you think?

dougwilson commented 10 years ago

Please, do not negotiate charset within the Accept header, as it is a violation of RFC 7231; you need to use Accept-Charset only.

As far as iteration yourself, have you tried to just use this module directly (or the accepts module that sits over it)? It will provide you the parsed Accept header already sorted on q and you can always iterate over them. Example:

var accepts = require('accepts')
var accept = accepts(req) // has header Accept: application/vnd.foo.bar; v=1; q=0.5, application/vnd.foo.bar; v=1

console.log(accept.types()) // provides the types in q-order

I mean, you can just use a switch statement. Your want to violate RFC 7231 and include the charset as part of the Accept negotiation is, I think, what is making your code extra complicated. Also, is there a reason you are also allowing +json to be optionally there? Why not just require it or not allow it?

moll commented 10 years ago

Charset seems to be a hot topic. RFC 2046 on the other hand presents charset as a useful global parameter. If you look at multipart email, you'll see Content-Type: text/plain; charset=utf-8 headers as well. I must say I'd prefer that for non-human browsable types. It is part of the resource representation format after all. Why not keep them together. The very least, handle it in addition to Accept-Charset.

Perhaps Accept-Charset was meant to be a more generic catch-all given that there was no syntax to "accept" multiple parameter values for a given media type without duplicating the full type.

Also, is there a reason you are also allowing +json to be optionally there? Why not just require it or not allow it?

My allowance for optional values is related to convenience and beauty. When doing requests manually for testing purposes, I can't be bothered to type that much. +json particularly would be there for libraries that automatically parse response JSON based on suffixes.

I'm thinking of actually parsing the types once per request and sorting/filtering manually. With all of those negotiator and type-is libraries one ends up parsing the same types dozen times. :-)

dougwilson commented 10 years ago

So, RFC 2046 has nothing to do with HTTP content negotiation. It talks about the use of the content-type field in an email that actually has a body. Even in HTTP the stuff you provide to the Accept header is very different from the stuff you use in the Content-Type header, they just look similar, which obviously leads to some confusion on how to correctly use them.

Basically trying to bundle charset negotiation within the Acceptheader is unacceptable in HTTP. As for the +json being optional, the HTTP negotiation cannot handle that, either. They provide a very specific BNF for what these fields look like and that negotiation on the type part can only take place as */* for anything, text/* for any text or test/plain for an exact match.

If you would like, feel free to email me and I can setup an hour or so conversation about how HTTP content negotiation functions and what the best practices are, etc.

Also remember that as soon as you start changing the way content negotiation is done, various clients that have negotiation built in to them won't be able to successfully negotiate, for example, the charset with your API, as they have no idea you want them to add it as a parameter in the Accept header and will instead be listing them in the Accept-Charset header and your API consumers will basically have to hack around their client's HTTP compliance to use your API.

moll commented 10 years ago

Hey again. Thanks for the thorough response.

I've spent some time with those respective RFCs as well and have come to the conclusion that both Accept-Charset and a Accept's media-types' charset parameter can be useful. They're not mutually exclusive nor semantically interchangeable. You might want to specify a charset set for a subset of types and finish it off with a catch-all */*. Accept-Charset would apply to them all. A charset parameter only to one type.

I've simplified my original requirements slightly, but still was unable to do what I needed with Negotiator or its underlying MediaTyper library. So I did what every self respective OSSian would do and created what I believe to be a better type parsing library: MediumType.js.

As I remember you were using the MediaTyper module underneath, you might want to look at MediumType. It behaves more correctly and is far easier to optimize. For example, it does not parse a comma separated array of types with parameters with commas incorrectly (although I think that was more Negotiator's shortcoming than MediaTyper's). You can also parse types once and then re-use those for matching, thereby improving performance. It's also a decent value type and not merely a plain object. Give it a try.

Thanks for the call proposal. I think I'll actually take you up on that offer. Always interesting to chat with people who care.

dougwilson commented 10 years ago

This module doesn't actually have any dependencies, least on one called media-type.

moll commented 10 years ago

Ah, you're right. I must have this confused with type-is. My bad. The comments on parsing types again and again and the parameters with commas still apply, though. ;)

dougwilson commented 10 years ago

I must have this confused with type-is.

No problem, though that has a dependency on media-typer (note the extra r at the end ;) ).

The comments on parsing types again and again and the parameters with commas still apply, though

Yep, I agree :) But these modules only operate directly on req and res from Node.js core, so they have no way to share previous parsing.

moll commented 10 years ago

No problem, though that has a dependency on media-typer (note the extra r at the end ;) ).

Yeah, remembered that at the last moment. Updated the comment shortly after. :)

Yep, I agree :) But these modules only operate directly on req and res from Node.js core, so they have no way to share previous parsing.

Fortunately, not exactly. When Negotiator is used as middleware, it does have state. Or can have. That'd be a great time to preparse types.

Feel free to use MediumType as a dep for proper splitting. And parsing while you're at it. ;-) I think Negotiator also handles quoted parameters wrong. ^_^

dougwilson commented 10 years ago

I'll check it out, for sure. Can you also explain how media-typer does not parse correctly as well? Open a bug report, perhaps?

moll commented 10 years ago

Sorry, I don't recall media-typer that well. I think it wasn't a proper type class and was based on the more restrictive media type syntax (i.e died on */*) which meant it couldn't be used to pre-parse media type patterns.

After having tried all of those media type libraries on NPM and having spent quite some of time reviewing them, I was just frustrated. I didn't see any that in my opinion had quality in them. Just some code thrown against the wall. I might've used media-typer initially for parsing, but the too restrictive token regexp had me rewrite that last part, too.

It probably also ignored inherited properties in parameters, which is my pet peeve fight against accidental complexity. :-P I'll have none of that sometime-works/sometimes-doesn't handling of inherited properties in my code. ;)

dougwilson commented 10 years ago

I think it wasn't a proper type class and was based on the more restrictive media type syntax (i.e died on */*)

Ah. You mean, it just doesn't support the types of values that are allowed in the Accept header. Yes, it is really for the Content-Type header, so that makes sense (it's also why it doesn't do , splitting). This is actually the reason it's not used in this library, as this library needs a parser that is made for the Accept header, haha.

I didn't see any that in my opinion had quality in them. Just some code thrown against the wall.

This is one of the biggest problems in Node.js land, and was the whole reason jshttp was founded. Sadly there are some library (even this one) that got pulled in that are not really up to the quality they should be, but they were ones that authors volunteered to the org and there hasn't yet been time to clean them up.

It probably also ignored inherited properties in parameters, which is my pet peeve fight against accidental complexity. :-P I'll have none of that sometime-works/sometimes-doesn't handling of inherited properties in my code. ;)

Hm, what does that mean? You're talking about the .format function not looking at stuff in the prototype chain of the passed-in object?

moll commented 10 years ago

Ah. You mean, it just doesn't support the types of values that are allowed in the Accept header.

Sorry, "type" was an overloaded term there. By type class I meant an instance of a class (or child of a prototype in JavaScript parlance) that acts as a native type. One you can do instanceof checks on and one that has toString, toJSON and other functions.

dougwilson commented 10 years ago

oooo, lol. So you means that it's because it returns a plain object instead of something that's more useful and has methods, etc.

moll commented 10 years ago

Hm, what does that mean? You're talking about the .format function not looking at stuff in the prototype chain of the passed-in object?

Yeah. obj.subtype in https://github.com/jshttp/media-typer/blob/master/index.js does take inheritance into account, Object.keys later on doesn't. Damn I hate Object.keys. The perfect function to create inconsistencies in JavaScript. That, and soon to be Object.assign. Bullet, meet feet.

oooo, lol. So you means that it's because it returns a plain object instead of something that's more useful and has methods, etc.

Yep .That particular one wasn't an objective criticism. Type classes are just something I find convenient to work with. Primarily toString and toJSON. The rest can be "static" functions for all I care. I just threw match in there with MediumType for good measure.

moll commented 10 years ago

This is actually the reason it's not used in this library, as this library needs a parser that is made for the Accept header, haha.

Oh, yeah. That's why I had to reimplement that bit as well. That parser would have to understand the syntax of a media type to properly handle embedded commas. ^_^

MediumType's does. :sunglasses: