Open PetrushenkoIrina opened 6 years ago
I've just checked with python3.7 locally and the first link works nice. I'm curious if @kennethreitz (or whoever has an access to the httpbin.org server) could provide an exact error from the logs?
This appears to be the error:
[2018-11-07 02:43:25 +0000] [8] [ERROR] Error handling request /response-headers?Content-Type=text/plain;%20charset=UTF-8&Content-Disposition=attachment;%20filename%3d%22foo-%C3%A4.html%22
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/base_async.py", line 56, in handle
self.handle_request(listener_name, req, client, addr)
File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/ggevent.py", line 160, in handle_request
addr)
File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/base_async.py", line 115, in handle_request
resp.write(item)
File "/usr/local/lib/python3.6/dist-packages/gunicorn/http/wsgi.py", line 333, in write
self.send_headers()
File "/usr/local/lib/python3.6/dist-packages/gunicorn/http/wsgi.py", line 329, in send_headers
util.write(self.sock, util.to_bytestring(header_str, "ascii"))
File "/usr/local/lib/python3.6/dist-packages/gunicorn/util.py", line 507, in to_bytestring
return value.encode(encoding)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 244: ordinal not in range(128)
This looks like a Gunicorn bug/limitation, I may write one up, but there are a lot of RFC considerations here.
This is tricky for a few reasons.
By using even latin-1
characters in that way, you are travelling close to the spec-bounds, which is highlighted by the problem under Gunicorn only, and as discussed in issues such as https://github.com/benoitc/gunicorn/issues/1778 . US-ASCII
only is safer, or using the header-encoding extensions available to represent other character-set encodings, but as you see, support will be sketchy.
This is tricky also because we are trying to pass specific characters via URL parameters, and unless we go to unusual lengths, Flask will decode those for us. The problem then is that certain characters (only) need to be re-encoded, and it's not easy to know which ones, or at least there isn't any obviously existing library function to call to encode those. I'm speaking of the encoding required to implement RFC 6266.
If you want to use that, you can by carefully encoding the percents %
you send in the request, so that they survive request decoding, and you are left with the encoded characters (ASCII representation of your unicode characters as UTF-8), which can then be send as headers. This is a kind of double-encoding.
So if you are prepared to shift to the RFC and filename*
representation (again, client support will vary, but modern browsers understand), you can send e.g.:
curl -v "http://localhost/response-headers?Content-Type=text/plain;%20charset=UTF-8&Content-Disposition=attachment;filename*=UTF-8''foo-%25C3%25A4.html"
... noting the encoded %
s as %25
. The resulting header will be:
Content-Disposition: attachment;filename*=UTF-8''foo-%C3%A4.html
... which is correct, and if run from a browser you will see the file download with name foo-ä.html
.
Sending GET request http://httpbin.org/response-headers?Content-Type=text/plain;%20charset=UTF-8&Content-Disposition=attachment;%20filename%3d%22foo-%C3%A4.html%22 was always working. It was replying with 200 OK and creating appropriate response, but now this request fails with error code 500. Note that the following request still works: http://httpbin.org/response-headers?Content-Type=text/plain;%20charset=UTF-8&Content-Disposition=attachment;%20filename%3d%22dfsf%22