Closed ipanova closed 6 years ago
This problem has surfaced multiple times. IIRC the problem is that there's no general way to combine multiple headers into one (separation by comma is not always RFC-conforming)
Section 4.2 says this, which I think is very clear that multiple values must be comma-joinable when providing multiple headers of the same name.
"It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma."
Is there a case you have in mind where comma separation would not conform?
The Host
header can neither appear multiple times nor can it contain a comma-separated list of hosts.
Also RFC 2616 is obsoleted by a set of new RFCs. RFC 7230 has the following to say:
A sender MUST NOT generate multiple header fields with the same field name in a message unless either the entire field value for that header field is defined as a comma-separated list [i.e., #(values)] or the header field is a well-known exception (as noted below).
So there are indeed cases where conversion from multiple header fields to comma-separated header value is acceptable, but that is not applicable to all headers. This actually makes a difference: If the client sends multiple host headers, would you pick the last one, or reject the entire request? I'm not sure if the difference matters in this particular example, and I'm not claiming that this bug is unfixable, but fixing it would also introduce a significant behavior change.
@untitaker i think the fix should at least apply to accept headers
Sure, a whitelist would solve this.
I just noticed that the piece of code you linked iterates through a dictionary which can't have multiple keys of the same name, so the additional headers are probably lost before that.
@untitaker that's not true, if i try to print self.headers it contains all the accept headers
@untitaker
_logger.info('XXXXXXXXXXXXXX')
_logger.info(self.headers)
_logger.info(type(self.headers))
for key, value in self.headers.items():
what i see:
XXXXXXXXXXXXXX
Host: localhost:5001
User-Agent: docker/1.10.3 go/go1.6.3 kernel/4.6.7-300.fc24.x86_64 os/linux arch/amd64
Accept: application/vnd.docker.distribution.manifest.v2+json
Accept: application/vnd.docker.distribution.manifest.list.v2+json
Accept: application/vnd.docker.distribution.manifest.v1+prettyjws
Accept: application/json
Accept-Encoding: gzip
Connection: close
<type 'instance'>
Fair enough.
Ok, so let's see what other WSGI servers are doing (e.g. gunicorn) and copy their behavior. It would be the absolute worst if one had different behavior between dev and production environments, no matter if the behavior is technically correct or not.
As one data point, I tested django, accessing the HTTP_ACCEPT
value here: https://docs.djangoproject.com/en/1.10/ref/request-response/#django.http.HttpRequest.META
Both when using Apache's mod_wsgi, and django's dev server, it successfully concatenated multiple Accept headers into a comma-separated list.
If mod_wsgi
does this when using Django, it should behave similarly for Flask/werkzeug. Is this assumption correct?
Yes, it should. Actually headers are concatenated by the Apache at the end of the parsing step.
This behaviour conforms to RFC3875 (CGI v1.1):
If multiple header fields with the same field-name are received then the server MUST rewrite them as a single value having the same semantics.
Thus it conforms to PEP 333:
The environ dictionary is required to contain these CGI environment variables, as defined by the Common Gateway Interface specification.
So right now the same application under run_simple
and mod_wsgi
/uwsgi
acts differently.
So I guess it's settled. Perhaps somebody could check how gunicorn behaves as well though?
I've been doing some digging, as I wan't to see all X-Forwarded-For
headers.. but only get the last one.
From what I gather, the issue lies in how rfc822.Message
parses headers into a dict
, which is fed into the werkzeug environ
(all headers are present in their raw form, however).
My point is, I would also like to see this issue addressed, and for more headers than merely the Accept
header ;)
From https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html section 4.2 last paragraph it should be possible to combine the multiple headers fields.
The way it is currently implemented it takes into account just one accept header https://github.com/pallets/werkzeug/blob/master/werkzeug/serving.py#L138
example of a request sent by docker client:
As you can see it sends multiple accept headers.
And this is what we get after headers get processed:
Expected behaviour
Other accept headers got lost. It should be possible to combine multiple header fields into one, so eventually we would contain a comma separated list of all accept headers. Something like: