Closed gwicke closed 8 years ago
It's highly unlikely I'll accept violating the RFC, as that's the whole point of this module: to parse according to the RFC. Can you at least provide very detailed explanation why you want to violate the RFC?
If you are asking to be more lenient and implement a second parser that is, then that can probably happen. Please link to the source code of other such lenient parsers so we can implement their proven parsing algorithm.
We have some stored content that has a content-type header with missing quoting in one of its parameters. Without the patch, content-type
simply throws an exception when encountering this header. With the patch, the header is parsed correctly, which lets us use content-type
to consistently format those headers before exposing them to the outside world.
Examples for lenient parsing in the wild:
Practical example:
text/html;profile=mediawiki.org/specs/html/1.1.0;charset=utf-8
(missing escaping around the profile parameter value).> document.contentType
"text/html"
Cool. Would you be willing to update your PR? It needs at least two changes:
Content-Type
header, as it does today.This means that either this change needs to be opt in or something else. Let me know what suggestions you have to implement this change in a backwards-compatible way, including keeping the current feature of this module which allows you to do strict validation. The whole reason this does strict validation and we use this in Express.js is to deter people from putting up web servers that serve invalid Content-Type
headers in the first place, like that URL you provided is doing.
If we do want to add a non-strict parsing in here, it should at least be an exact copy of a long-standing algorithm, for example, it should perhaps parse exactly how that Chrome one does (and all an entire suite of tests to go along with this new parser, please).
Ideally, after the changes here, I can see you being able to do the "Content-Type invalid -> valid fixing" by doing contentType.format(contentType.parse(str, { loose: true }))
or something like that.
In fact, thinking about it more, I'm open to perhaps making the non-strict parsing the default, but still, would want both strict and non-strict parsing supported with a switch (and, of course, tests).
@dougwilson: An option sounds good to me, and I also think that implementing the non-strict algorithm as in Chrome & FF makes sense. It will likely also be faster.
I made a note to get back to this, but might not find the time right away.
Cool. Yea, this open PR will also put it on my list as well, so it'll likely end up being whichever one of us gets to it first, then :)
Also, this could be a temporary work-around to achieve the desired goal, without needing to maintain a custom fork, which should help with keeping timeline pressure at bay:
var contentType = require('content-type')
var rawParameterValueRegExp = /(; *[^=]+=)([^"][^;]*)/g
module.exports = function normalizeContentType(str) {
try {
return contentType.format(contentType.parse(str))
} catch (err) {
try {
return contentType.format(contentType.parse(str.replace(rawParameterValueRegExp, '$1"$2"')))
} catch (e) {
throw err // original error
}
}
}
Hi @gwicke I hope it's OK if I close this PR; I know we discussed above what to do, and I feel like we did come up with a good plan: implement the same loose parsing algorithm as Chrome/Firefox/whatever and expose it through a parsing option (perhaps { lax: true }
). I don't really want to keep this PR open forever, especially since it's not similar to what we think the ideal solution should be, so doesn't really have value staying open. Please feel free to resubmit a PR with the new implementation at any time :) !
While the RFC is excluding slashes from tokens and thus parameter values, some existing content-type values do contain them, and existing content-type parsers are generally lenient when encountering them.