Closed mgedmin closed 5 years ago
I'm wondering if this is the same issue as #40, only in that case I happened to have Unicode characters that were encodable to Latin-1?
Marius Gedminas wrote at 2019-7-2 07:37 -0700:
BrowserRequest.processInputs() is working on the assumption that cgi.FieldStorage will give it UTF-8 text encoded to Latin-1, as per PEP-3333. This is not the case: on POST requests on Pytohn 3.7 I'm seeing the FieldStorage hold native
str
objects containing already-decoded Unicode text, so when BrowserRequest._decode() tries totext = text.encode('latin-1')
, things fail.
cgi.FieldStorage
returns "latin-1" decoded bytes if it is correctly
called. The parameter is named encoding
.
cgi.FieldStorage
returns "latin-1" decoded bytes if it is correctly called. The parameter is namedencoding
.
I do not understand what you mean by that? Are you saying that zope.publisher calling cgi.FieldStorage incorrectly?
My current theory is that the encode('latin-1')
is right for parsing GET requests (with QUERY_STRING coming from the WSGI environment directly), but wrong for parsing POST requests (where the data comes from the wsgi.input BytesIO object).
Marius Gedminas wrote at 2019-7-3 03:19 -0700:
cgi.FieldStorage
returns "latin-1" decoded bytes if it is correctly called. The parameter is namedencoding
.I do not understand what you mean by that? Areyou saying that zope.publisher calling cgi.FieldStorage incorrectly?
Yes -- is you want "latin-1" decoded bytes as values.
Marius Gedminas wrote at 2019-7-3 04:11 -0700:
My current theory is that the
encode('latin-1')
is right for parsing GET requests (with QUERY_STRING coming from the WSGI environment directly), but wrong for parsing POST requests (where the data comes from the wsgi.input BytesIO object).
"cgi.FieldStorage" handles the differences between "GET" and "POST" correctly.
I've a patch in progress that fixes my application by dropping zope.publisher's conversion logic and using the Unicode values produced by cgi.FieldStorage directly, when on Python 3.
It breaks zope.publisher's test suite quite badly. I'll have to investigate why the existing tests do not match real-world usage.
It breaks zope.publisher's test suite quite badly.
Actually that was just one failing test, repeated for almost every tox environment, producing scary amounts of terminal scrollback.
BrowserRequest.processInputs() is working on the assumption that cgi.FieldStorage will give it UTF-8 text encoded to Latin-1, as per PEP-3333. This is not the case: on POST requests on Python 3.7 I'm seeing the FieldStorage hold native
str
objects containing already-decoded Unicode text, so when BrowserRequest._decode() tries totext = text.encode('latin-1')
, things fail.