pallets / werkzeug

The comprehensive WSGI web application library.
https://werkzeug.palletsprojects.com
BSD 3-Clause "New" or "Revised" License
6.62k stars 1.73k forks source link

WSGIRequestHandler should be able to process multiple accept header with same field name. #1070

Closed ipanova closed 6 years ago

ipanova commented 7 years ago

From https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html section 4.2 last paragraph it should be possible to combine the multiple headers fields.

The way it is currently implemented it takes into account just one accept header https://github.com/pallets/werkzeug/blob/master/werkzeug/serving.py#L138

example of a request sent by docker client:

Host: localhost:5001
User-Agent: docker/1.10.3 go/go1.6.3 kernel/4.6.7-300.fc24.x86_64 os/linux arch/amd64
Accept: application/vnd.docker.distribution.manifest.list.v2+json
Accept: application/vnd.docker.distribution.manifest.v1+prettyjws
Accept: application/json
Accept: application/vnd.docker.distribution.manifest.v2+json
Accept-Encoding: gzip
Connection: close

As you can see it sends multiple accept headers.

And this is what we get after headers get processed:

'HTTP_ACCEPT': 'application/vnd.docker.distribution.manifest.v2+json'

Expected behaviour

Other accept headers got lost. It should be possible to combine multiple header fields into one, so eventually we would contain a comma separated list of all accept headers. Something like:

HTTP_ACCEPT': 'application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/json, application/vnd.docker.distribution.manifest.v1+prettyjws'
Python 2.7.12 (default, Aug  9 2016, 15:48:18) 
Type "copyright", "credits" or "license" for more information.

In [1]: import werkzeug

In [2]: werkzeug.__version__
Out[2]: '0.11.11'
untitaker commented 7 years ago

This problem has surfaced multiple times. IIRC the problem is that there's no general way to combine multiple headers into one (separation by comma is not always RFC-conforming)

mhrivnak commented 7 years ago

Section 4.2 says this, which I think is very clear that multiple values must be comma-joinable when providing multiple headers of the same name.

"It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma."

Is there a case you have in mind where comma separation would not conform?

untitaker commented 7 years ago

The Host header can neither appear multiple times nor can it contain a comma-separated list of hosts.

Also RFC 2616 is obsoleted by a set of new RFCs. RFC 7230 has the following to say:

A sender MUST NOT generate multiple header fields with the same field name in a message unless either the entire field value for that header field is defined as a comma-separated list [i.e., #(values)] or the header field is a well-known exception (as noted below).

So there are indeed cases where conversion from multiple header fields to comma-separated header value is acceptable, but that is not applicable to all headers. This actually makes a difference: If the client sends multiple host headers, would you pick the last one, or reject the entire request? I'm not sure if the difference matters in this particular example, and I'm not claiming that this bug is unfixable, but fixing it would also introduce a significant behavior change.

ipanova commented 7 years ago

@untitaker i think the fix should at least apply to accept headers

untitaker commented 7 years ago

Sure, a whitelist would solve this.

I just noticed that the piece of code you linked iterates through a dictionary which can't have multiple keys of the same name, so the additional headers are probably lost before that.

ipanova commented 7 years ago

@untitaker that's not true, if i try to print self.headers it contains all the accept headers

ipanova commented 7 years ago

@untitaker

        _logger.info('XXXXXXXXXXXXXX')
        _logger.info(self.headers)
        _logger.info(type(self.headers))
        for key, value in self.headers.items():

what i see:

XXXXXXXXXXXXXX
Host: localhost:5001
User-Agent: docker/1.10.3 go/go1.6.3 kernel/4.6.7-300.fc24.x86_64 os/linux arch/amd64
Accept: application/vnd.docker.distribution.manifest.v2+json
Accept: application/vnd.docker.distribution.manifest.list.v2+json
Accept: application/vnd.docker.distribution.manifest.v1+prettyjws
Accept: application/json
Accept-Encoding: gzip
Connection: close

<type 'instance'>
untitaker commented 7 years ago

Fair enough.

untitaker commented 7 years ago

Ok, so let's see what other WSGI servers are doing (e.g. gunicorn) and copy their behavior. It would be the absolute worst if one had different behavior between dev and production environments, no matter if the behavior is technically correct or not.

mhrivnak commented 7 years ago

As one data point, I tested django, accessing the HTTP_ACCEPT value here: https://docs.djangoproject.com/en/1.10/ref/request-response/#django.http.HttpRequest.META

Both when using Apache's mod_wsgi, and django's dev server, it successfully concatenated multiple Accept headers into a comma-separated list.

untitaker commented 7 years ago

If mod_wsgi does this when using Django, it should behave similarly for Flask/werkzeug. Is this assumption correct?

dmage commented 7 years ago

Yes, it should. Actually headers are concatenated by the Apache at the end of the parsing step.

This behaviour conforms to RFC3875 (CGI v1.1):

If multiple header fields with the same field-name are received then the server MUST rewrite them as a single value having the same semantics.

Thus it conforms to PEP 333:

The environ dictionary is required to contain these CGI environment variables, as defined by the Common Gateway Interface specification.

So right now the same application under run_simple and mod_wsgi/uwsgi acts differently.

untitaker commented 7 years ago

So I guess it's settled. Perhaps somebody could check how gunicorn behaves as well though?

kaos commented 7 years ago

I've been doing some digging, as I wan't to see all X-Forwarded-For headers.. but only get the last one.

From what I gather, the issue lies in how rfc822.Message parses headers into a dict, which is fed into the werkzeug environ (all headers are present in their raw form, however).

My point is, I would also like to see this issue addressed, and for more headers than merely the Accept header ;)