ramkrishanbhatt / modwsgi

Automatically exported from code.google.com/p/modwsgi
0 stars 0 forks source link

Multiple header values get lost #225

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Submit a request to a mod_wsgi application with:
1) Multiple headers with the same name defined.  For example:
Foo-bar: xyz
foo-Bar: abc

Expected: All header values provided should be present in HTTP_FOO_BAR with the 
request's ordering preserved (ex: {'HTTP_FOO_BAR': ['xyz', 'abc']} or 
{'HTTP_FOO_BAR': 'xyz, abc'}).

Seen Instead:  Only the last header value in the request is chosen.

2) ...or multiple headers with names that differ only in non-alphanum 
characters.  For example:
Foo-bar: xyz
foo_Bar: abc
foo.bar: 123

Expected: Each of these headers should have a unique key in environ, as they 
are different headers per RFC2616 4.2 (ex: HTTP_FOO-BAR, HTTP_FOO_BAR, 
HTTP_FOO.BAR).

Seen Instead:  All of these in the example get glommed into HTTP_FOO_BAR.  
Additionally, only the last one is present in environ (see #1, above).

3) ...or any combination of 1 and 2

Number 1 can be addressed by changing wsgi_http2env in mod_wsgi.c such that 
non-alphanum characters are preserved, and #2 can be changed by modifying the 
line that calls wsgi_http2env (currently line 13628 in mod_wsgi.c in trunk) so 
that it first checks if the header has been defined, and if so append the new 
value to the original (whether via list append or comma-separated string 
concatenation).

To test this, I created a simple WSGI application that dumps the contents of 
environ back to the client.  I then issued the following curl requests against 
mod_wsgi:

curl -v -H 'Hooba: paschooba' -H 'Hooba: pops' 'http://10.0.0.106/test_wsgi'

curl -v -H 'Hooba-hoo: paschooba' -H 'Hooba_hoo: pops' 
'http://10.0.0.106/test_wsgi'

Original issue reported on code.google.com by Brian.Gu...@gmail.com on 27 Jan 2011 at 4:13

GoogleCodeExporter commented 8 years ago
It is the CGI RFC which dictates the rule of how the names of the variables 
corresponding to headers are constructed and the CGI RFC says:

   Meta-variables with names beginning with "HTTP_" contain values read
   from the client request header fields, if the protocol used is HTTP.
   The HTTP header field name is converted to upper case, has all
   occurrences of "-" replaced with "_" and has "HTTP_" prepended to
   give the meta-variable name.  The header data can be presented as
   sent by the client, or can be rewritten in ways which do not change
   its semantics.  If multiple header fields with the same field-name
   are received then the server MUST rewrite them as a single value
   having the same semantics.  Similarly, a header field that spans
   multiple lines MUST be merged onto a single line.  The server MUST,
   if necessary, change the representation of the data (for example, the
   character set) to be appropriate for a CGI meta-variable.

Specifically, (2) in your list is not possible because of the naming rule 
dictated by the RFC.

For (1), it is Apache (the server) which is responsible for joining values of 
same named headers. I recollect though that it may only do this for headers 
which have known semantics because if it doesn't know, it cant guarantee that 
the result has the same meaning. For example, it would know the rules for 
joining the values of standard 'Accept' headers but isn't going to know what 
rule should be applied for joining an arbitrary user header, especially a non 
standard header like yours which by rights should be using a 'X-' prefix. In 
that case where it can't guarantee the same semantics, it is using the last of 
that header which is encountered. This occurs because Apache stores request 
headers in a dictionary keyed on case insensitive name as key. Thus the action 
of adding the subsequent one results in the loss of an earlier one.

In (1), your first example of passing a list also is against what the WSGI 
specification says, which is that values associated with variables in WSGI 
environment for request headers, must be native string values. You therefore 
can't pass a list as it would violate the WSGI specification and break all WSGI 
frameworks which expect that it will be just a string.

If you have issues with the specifications and the above behaviour you can try 
arguing it on the Python WEB-SIG mailing list, but the CGI/WSGI specifications 
are what they are and I doubt you will find anyone sympathetic to your view 
point.

Original comment by Graham.Dumpleton@gmail.com on 27 Jan 2011 at 8:40

GoogleCodeExporter commented 8 years ago
Fair enough.  You're right that the CGI/WSGI spec is what it is and there's not 
really much I can do about that.  I'm not familiar with all of the internals of 
Apache, so I didn't know that was how Apache treated multiple headers with the 
same name.  However the RFC does (at the very least) strongly imply that 
multiple values should be concatenated in order with commas in between (but 
that's an issue for the Apache folks).

Thanks for your help.

Original comment by Brian.Gu...@gmail.com on 27 Jan 2011 at 8:48