python / cpython

The Python programming language
https://www.python.org
Other
63.42k stars 30.37k forks source link

test_xmlrpc fails with non-ascii path #51855

Closed pitrou closed 14 years ago

pitrou commented 14 years ago
BPO 7606
Nosy @loewis, @pitrou, @vstinner, @bitdancer, @florentx
Files
  • xmlrpc_server_ascii_traceback.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['tests', 'type-bug', 'library'] title = 'test_xmlrpc fails with non-ascii path' updated_at = user = 'https://github.com/pitrou' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = True closed_date = closer = 'vstinner' components = ['Library (Lib)', 'Tests'] creation = creator = 'pitrou' dependencies = [] files = ['16063'] hgrepos = [] issue_num = 7606 keywords = ['patch'] message_count = 13.0 messages = ['97063', '97064', '97068', '97069', '97071', '97072', '98593', '98594', '103275', '103322', '103323', '103335', '103382'] nosy_count = 5.0 nosy_names = ['loewis', 'pitrou', 'vstinner', 'r.david.murray', 'flox'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'needs patch' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue7606' versions = ['Python 3.1', 'Python 3.2'] ```

    pitrou commented 14 years ago

    I configured my buildbot to use a non-ascii path to the interpreter and test_xmlrpc fails as follows:

    ----------------------------------------

    Exception happened during processing of request from ('127.0.0.1', 59091)
    Traceback (most recent call last):
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py",
    line 448, in do_POST
        size_remaining = int(self.headers["content-length"])
    ValueError: invalid literal for int() with base 10: 'I am broken'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
    line 281, in _handle_request_noblock
        self.process_request(request, client_address)
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
    line 307, in process_request
        self.finish_request(request, client_address)
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
    line 320, in finish_request
        self.RequestHandlerClass(request, client_address, self)
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py",
    line 614, in __init__
        self.handle()
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py",
    line 352, in handle
        self.handle_one_request()
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py",
    line 346, in handle_one_request
        method()
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py",
    line 472, in do_POST
        self.send_header("X-traceback", traceback.format_exc())
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py",
    line 410, in send_header
        self.wfile.write(("%s: %s\r\n" % (keyword, value)).encode('ASCII',
    'strict'))
    UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in
    position 93: ordinal not in range(128)
    ----------------------------------------
    

    ====================================================================== FAIL: test_fail_with_info (test.test_xmlrpc.FailingServerTestCase) ----------------------------------------------------------------------

    Traceback (most recent call last):
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py",
    line 555, in test_fail_with_info
        p.pow(6,8)
    xmlrpc.client.ProtocolError: <ProtocolError for 127.0.0.1:57828/RPC2:
    500 Internal Server Error>
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File
    "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py",
    line 562, in test_fail_with_info
        self.assertTrue(e.headers.get("X-traceback") is not None)
    AssertionError: False is not True

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 14 years ago
    self.send_header("X-traceback", traceback.format_exc())

    That's fairly tricky. send_header expects two strings (bytes are not acceptable), and also requires these strings to be ASCII. This is why it breaks: format_exc returns a non-ASCII string.

    I see two options: a) allow non-Unicode values for keyword and value in send_header, and have xmlrpc.server encode the header itself, or b) properly MIME-encode value if it contains non-ASCII characters (keyword really must be ASCII, I think).

    Not sure whether there is any precedence for UTF-8 in HTTP headers.

    bitdancer commented 14 years ago

    A little googling came up with this page:

    http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/topic/com.ibm.itame.doc/am61_webseal_admin570.htm

    Their solution is to uri encode the UTF8 encoded data.

    However, this article references the RFCs, which look like they call for rfc2047 (MIME) encoded words:

    http://stackoverflow.com/questions/324470/http-headers-encoding-decoding-in-java

    pitrou commented 14 years ago

    If it's only about transmitting the string representation of the traceback, perhaps we can simply use "replace" or "ignore" as the error handler?

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 14 years ago

    David: I think it's a little bit more complicated. RFC 2616 says that the value of a header is *TEXT, which is defined as

    The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 only when encoded according to the rules of RFC 2047

    So I think send_header should change in the following way:

    a) if isinstance(value, bytes): send value as-is b) if value can be encoded in latin-1: encode in latin-1, then send as-is c) otherwise: MIME-encode as UTF-8, using the following algorithm

    1. count the number of non-ascii characters, by encoding with ascii, ignore, and comparing result lengths
    2. if there are less than 10% non-ascii character, use the Q encoding
    3. otherwise, use the B encoding

    The purpose of the algorithm in c) would be that text containing a few non-latin characters still comes out right even if the receiver fails to decode the header.

    The same change would also apply to the client-side of sending headers. On the receiving side, we should offer an option to decode headers (both for client and server); this should be an option because senders may not comply with RFC 2616. Reading should then proceed as follows:

    1. check whether there are MIME markers in the text
    2. if so, MIME-decode
    3. if not, decode as latin-1
    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 14 years ago

    Antoine: sure, to fix the issue at hand, we can work-around.

    However, the issue of sending non-ASCII headers in HTTP remains, and should also be fixed.

    vstinner commented 14 years ago

    bpo-7608 was a duplicate issue. Copy of my message (msg98091): ----- SimpleXMLRPCRequestHandler.do_POST() writes the traceback in the HTTP header "X-traceback". But an HTTP header value is ASCII only, whereas a traceback can contain any character (eg. an non-ASCII character from a directory name for this issue).

    A simple fix would be to use the ASCII charset with the backslashreplace error handler. Attached patch uses:

       trace = str(trace.encode('ASCII', 'backslashreplace'), 'ASCII')

    Is there an easier method to escape non-ASCII characters without double conversion (unicode->bytes and bytes->unicode)? ----- I also copied my patch to this issue.

    vstinner commented 14 years ago

    pitrou> If it's only about transmitting the string representation of the pitrou> traceback, perhaps we can simply use "replace" or "ignore" as the error pitrou> handler?

    Both replace and ignore loose information. My patch keeps all information by using backslashreplace. It's consistent with Python behaviour: Python writes a backtrace to stderr which uses also the backslashreplace error handler.

    vstinner commented 14 years ago

    What do you think about my solution (convert the traceback to ASCII to avoid the encoding issue)? If you would like to support non-ASCII characters in HTTP headers, you should open a new issue. For the compatibility, I prefer to use pure ASCII headers because I fear that third party programs doesn't support non-ASCII headers.

    pitrou commented 14 years ago

    What do you think about my solution (convert the traceback to ASCII to avoid the encoding issue)?

    It's fine for me. Perhaps you should add a comment to explain why this is necessary.

    vstinner commented 14 years ago

    Commited: r80112 (py3k). Waiting for the buildbots before te backport to 3.1.

    vstinner commented 14 years ago

    Commited: r80112 (py3k)

    Looks good: r80118 (3.1).

    vstinner commented 14 years ago

    If anyone would like to work on non-ASCII HTTP header, please open a new issue with a pointer to this one.