pallets / flask

The Python micro framework for building web applications.
https://flask.palletsprojects.com
BSD 3-Clause "New" or "Revised" License
68.06k stars 16.22k forks source link

send_file: latin-1 encoding not compatible with gunicorn #2766

Closed jrast closed 6 years ago

jrast commented 6 years ago

With the latest release (flask 1.0.0), unicode attachement filenames are allowed by flask. I was waiting for this change and it seems to work great using the builtin dev server. However, gunicorn on the production server does not support latin-1 encoding for the headers and only supports ascii.

Originaly flask also opted for ASCII encoding, however commit 336d6a48e0104b53302fd8377dd399af43607cf9 changed this and since then latin-1 is used.

What was the rational to switch to latin-1? I know, officially it's allowed, however gunicorn decided to only support ASCII: (see related issue)

it's well documented around the web that HTTP headers should be ASCII

Is there a change that flask also switches back to ASCII?

Related Issues:

davidism commented 6 years ago

Headers are Latin-1. Gunicorn only allowing ASCII is incorrect.

jrast commented 6 years ago

I know, headers should be encoded using latin-1, but I think it's not uncommon to use flask + gunicorn and looking at the closed issue from some time ago, it's not like they will change soon.

However, I will open a new issue in the gunicorn bugtracker, as it's really a gunicorn issue.

benoitc commented 6 years ago

@davidism that's not true according the updated spec 7230:

Historically, HTTP has allowed field content with text in the ISO‑8859‑1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US‑ASCII octets. A recipient SHOULD treat other octets in field content (obs‑text) as opaque data.

Anyway it used to work with previous versions of flask. What changed since?

tilgovi commented 6 years ago

Anyway it used to work with previous versions of flask. What changed since?

https://github.com/pallets/flask/commit/336d6a48e0104b53302fd8377dd399af43607cf9

davidism commented 6 years ago

@benoitc I based the change on the Unicode section of PEP 3333:

Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding.

Werkzeug encodes with Latin-1 in many other places, including the encoding dance, so I'm not sure why it didn't cause issues before.

The wsgiref server uses Latin-1 / ISO-8859-1 as well.

davidism commented 6 years ago

Can I get an example of a filename that caused Gunicorn to raise an error? I changed send_file to use ascii encoding, but our test with a unicode filename still passed without change.

https://github.com/pallets/flask/blob/d22491a3fbf2c1926ebc3d6919e228ed903d0670/tests/test_helpers.py#L641-L649

Here's the code for the filename header (I replaced 'latin-1' with 'ascii' locally):

https://github.com/pallets/flask/blob/d22491a3fbf2c1926ebc3d6919e228ed903d0670/flask/helpers.py#L566-L577

davidism commented 6 years ago

The following fails with Gunicorn:

from flask import Flask, send_file

app = Flask(__name__)

@app.route('/')
def index():
    return send_file('example.py', as_attachment=True, attachment_filename='pingüino.txt')
gunicorn example:app

The filename is encodable as Latin-1, so it's passed through as-is. If it had UTF-8 characters, it would trigger the filename* behavior and the result would be ASCII, which is why the name from the current test worked.