new bug: 500 code returned in case request with utf-8 symbols sent (see link below)

PetrushenkoIrina commented 6 years ago

Sending GET request http://httpbin.org/response-headers?Content-Type=text/plain;%20charset=UTF-8&Content-Disposition=attachment;%20filename%3d%22foo-%C3%A4.html%22 was always working. It was replying with 200 OK and creating appropriate response, but now this request fails with error code 500. Note that the following request still works: http://httpbin.org/response-headers?Content-Type=text/plain;%20charset=UTF-8&Content-Disposition=attachment;%20filename%3d%22dfsf%22

ghost commented 6 years ago

I've just checked with python3.7 locally and the first link works nice. I'm curious if @kennethreitz (or whoever has an access to the httpbin.org server) could provide an exact error from the logs?

javabrett commented 6 years ago

This appears to be the error:

[2018-11-07 02:43:25 +0000] [8] [ERROR] Error handling request /response-headers?Content-Type=text/plain;%20charset=UTF-8&Content-Disposition=attachment;%20filename%3d%22foo-%C3%A4.html%22
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/base_async.py", line 56, in handle
    self.handle_request(listener_name, req, client, addr)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/ggevent.py", line 160, in handle_request
    addr)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/base_async.py", line 115, in handle_request
    resp.write(item)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/http/wsgi.py", line 333, in write
    self.send_headers()
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/http/wsgi.py", line 329, in send_headers
    util.write(self.sock, util.to_bytestring(header_str, "ascii"))
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/util.py", line 507, in to_bytestring
    return value.encode(encoding)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 244: ordinal not in range(128)

javabrett commented 6 years ago

https://github.com/benoitc/gunicorn/issues/1214

javabrett commented 6 years ago

This looks like a Gunicorn bug/limitation, I may write one up, but there are a lot of RFC considerations here.

javabrett commented 6 years ago

This is tricky for a few reasons.

By using even latin-1 characters in that way, you are travelling close to the spec-bounds, which is highlighted by the problem under Gunicorn only, and as discussed in issues such as https://github.com/benoitc/gunicorn/issues/1778 . US-ASCII only is safer, or using the header-encoding extensions available to represent other character-set encodings, but as you see, support will be sketchy.

This is tricky also because we are trying to pass specific characters via URL parameters, and unless we go to unusual lengths, Flask will decode those for us. The problem then is that certain characters (only) need to be re-encoded, and it's not easy to know which ones, or at least there isn't any obviously existing library function to call to encode those. I'm speaking of the encoding required to implement RFC 6266.

If you want to use that, you can by carefully encoding the percents % you send in the request, so that they survive request decoding, and you are left with the encoded characters (ASCII representation of your unicode characters as UTF-8), which can then be send as headers. This is a kind of double-encoding.

So if you are prepared to shift to the RFC and filename* representation (again, client support will vary, but modern browsers understand), you can send e.g.:

curl -v "http://localhost/response-headers?Content-Type=text/plain;%20charset=UTF-8&Content-Disposition=attachment;filename*=UTF-8''foo-%25C3%25A4.html"

... noting the encoded %s as %25. The resulting header will be:

Content-Disposition: attachment;filename*=UTF-8''foo-%C3%A4.html

... which is correct, and if run from a browser you will see the file download with name foo-ä.html.

postmanlabs / httpbin

new bug: 500 code returned in case request with utf-8 symbols sent (see link below) #446