Closed pitrou closed 14 years ago
I configured my buildbot to use a non-ascii path to the interpreter and test_xmlrpc fails as follows:
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 59091)
Traceback (most recent call last):
File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py",
line 448, in do_POST
size_remaining = int(self.headers["content-length"])
ValueError: invalid literal for int() with base 10: 'I am broken'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
line 281, in _handle_request_noblock
self.process_request(request, client_address)
File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
line 307, in process_request
self.finish_request(request, client_address)
File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
line 320, in finish_request
self.RequestHandlerClass(request, client_address, self)
File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
line 614, in __init__
self.handle()
File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py",
line 352, in handle
self.handle_one_request()
File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py",
line 346, in handle_one_request
method()
File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py",
line 472, in do_POST
self.send_header("X-traceback", traceback.format_exc())
File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py",
line 410, in send_header
self.wfile.write(("%s: %s\r\n" % (keyword, value)).encode('ASCII',
'strict'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in
position 93: ordinal not in range(128)
----------------------------------------
====================================================================== FAIL: test_fail_with_info (test.test_xmlrpc.FailingServerTestCase) ----------------------------------------------------------------------
Traceback (most recent call last):
File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py",
line 555, in test_fail_with_info
p.pow(6,8)
xmlrpc.client.ProtocolError: <ProtocolError for 127.0.0.1:57828/RPC2:
500 Internal Server Error>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py",
line 562, in test_fail_with_info
self.assertTrue(e.headers.get("X-traceback") is not None)
AssertionError: False is not True
self.send_header("X-traceback", traceback.format_exc())
That's fairly tricky. send_header expects two strings (bytes are not acceptable), and also requires these strings to be ASCII. This is why it breaks: format_exc returns a non-ASCII string.
I see two options: a) allow non-Unicode values for keyword and value in send_header, and have xmlrpc.server encode the header itself, or b) properly MIME-encode value if it contains non-ASCII characters (keyword really must be ASCII, I think).
Not sure whether there is any precedence for UTF-8 in HTTP headers.
A little googling came up with this page:
Their solution is to uri encode the UTF8 encoded data.
However, this article references the RFCs, which look like they call for rfc2047 (MIME) encoded words:
http://stackoverflow.com/questions/324470/http-headers-encoding-decoding-in-java
If it's only about transmitting the string representation of the traceback, perhaps we can simply use "replace" or "ignore" as the error handler?
David: I think it's a little bit more complicated. RFC 2616 says that the value of a header is *TEXT, which is defined as
The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 only when encoded according to the rules of RFC 2047
So I think send_header should change in the following way:
a) if isinstance(value, bytes): send value as-is b) if value can be encoded in latin-1: encode in latin-1, then send as-is c) otherwise: MIME-encode as UTF-8, using the following algorithm
The purpose of the algorithm in c) would be that text containing a few non-latin characters still comes out right even if the receiver fails to decode the header.
The same change would also apply to the client-side of sending headers. On the receiving side, we should offer an option to decode headers (both for client and server); this should be an option because senders may not comply with RFC 2616. Reading should then proceed as follows:
Antoine: sure, to fix the issue at hand, we can work-around.
However, the issue of sending non-ASCII headers in HTTP remains, and should also be fixed.
bpo-7608 was a duplicate issue. Copy of my message (msg98091): ----- SimpleXMLRPCRequestHandler.do_POST() writes the traceback in the HTTP header "X-traceback". But an HTTP header value is ASCII only, whereas a traceback can contain any character (eg. an non-ASCII character from a directory name for this issue).
A simple fix would be to use the ASCII charset with the backslashreplace error handler. Attached patch uses:
trace = str(trace.encode('ASCII', 'backslashreplace'), 'ASCII')
Is there an easier method to escape non-ASCII characters without double conversion (unicode->bytes and bytes->unicode)? ----- I also copied my patch to this issue.
pitrou> If it's only about transmitting the string representation of the pitrou> traceback, perhaps we can simply use "replace" or "ignore" as the error pitrou> handler?
Both replace and ignore loose information. My patch keeps all information by using backslashreplace. It's consistent with Python behaviour: Python writes a backtrace to stderr which uses also the backslashreplace error handler.
What do you think about my solution (convert the traceback to ASCII to avoid the encoding issue)? If you would like to support non-ASCII characters in HTTP headers, you should open a new issue. For the compatibility, I prefer to use pure ASCII headers because I fear that third party programs doesn't support non-ASCII headers.
What do you think about my solution (convert the traceback to ASCII to avoid the encoding issue)?
It's fine for me. Perhaps you should add a comment to explain why this is necessary.
Commited: r80112 (py3k). Waiting for the buildbots before te backport to 3.1.
Commited: r80112 (py3k)
Looks good: r80118 (3.1).
If anyone would like to work on non-ASCII HTTP header, please open a new issue with a pointer to this one.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['tests', 'type-bug', 'library']
title = 'test_xmlrpc fails with non-ascii path'
updated_at =
user = 'https://github.com/pitrou'
```
bugs.python.org fields:
```python
activity =
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date =
closer = 'vstinner'
components = ['Library (Lib)', 'Tests']
creation =
creator = 'pitrou'
dependencies = []
files = ['16063']
hgrepos = []
issue_num = 7606
keywords = ['patch']
message_count = 13.0
messages = ['97063', '97064', '97068', '97069', '97071', '97072', '98593', '98594', '103275', '103322', '103323', '103335', '103382']
nosy_count = 5.0
nosy_names = ['loewis', 'pitrou', 'vstinner', 'r.david.murray', 'flox']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'needs patch'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue7606'
versions = ['Python 3.1', 'Python 3.2']
```