python / cpython

The Python programming language
https://www.python.org
Other
63.31k stars 30.31k forks source link

printf-style Bytes Formatting sometimes do not worked. #75126

Closed 2ecf86d6-06f5-4022-9dff-01c8aac49d8e closed 7 years ago

2ecf86d6-06f5-4022-9dff-01c8aac49d8e commented 7 years ago
BPO 30943
Nosy @gareth-rees, @zaazbb
Superseder
  • bpo-29714: can't interpolate byte string with \x00 before replacement identifier
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-bug', 'library'] title = 'printf-style Bytes Formatting sometimes do not worked.' updated_at = user = 'https://github.com/zaazbb' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'none' closed = True closed_date = closer = 'r.david.murray' components = ['Library (Lib)'] creation = creator = 'zaazbb' dependencies = [] files = [] hgrepos = [] issue_num = 30943 keywords = [] message_count = 3.0 messages = ['298465', '298481', '298483'] nosy_count = 2.0 nosy_names = ['gdr@garethrees.org', 'zaazbb'] pr_nums = [] priority = 'normal' resolution = 'out of date' stage = 'resolved' status = 'closed' superseder = '29714' type = 'behavior' url = 'https://bugs.python.org/issue30943' versions = ['Python 3.6'] ```

    2ecf86d6-06f5-4022-9dff-01c8aac49d8e commented 7 years ago
    # works.
    >>> b'\x00\x08%(amount)b'% {b'amount':b'11111'}
    b'\x00\x0811111'
    # not work.
    >>> b'\x11\x00\x08%(amount)b'% {b'amount':b'11111'}
    b'\x11\x00\x08%(amount)b'
    
    # not work.
    >>> amount=bytearray(b'000000000010')
    >>> posnum=bytearray(b'423')
    >>> date_ =b'170717'
    >>> time_=b'160006'
    >>> b'\x02\x03\x13\x9f\x00\x00\x04NULL\x9f\x01\x00\x01\x02\x9f\x03\x00\x0c\xd6\xd0\xb9\xfa\xd2\xf8\xc1\xaa\xb2\xe2\xca\xd4\x9f\x04\x00\x0f307310083980007\x9f\x05\x00\x0814025520\x9f\x14\x00\x0200\x9f\x19\x00\x08\xbd\xbb\xd2\xd7\xb3\xc9\xb9\xa6\x9f\x07\x00\x0803072900\x9f\x08\x00\n0014243000\x9f\t\x00\x08\xbd\xf0\xbf\xa8\xd6\xd0\xd0\xc4\x9f\n\x00\x0800092900\x9f\r\x00\x06000145\x9f\x0e\x00\x06000081\x9f\x10\x00\x0c162246168268\x9f\x11\x00\x08%(date)s\x9f\x12\x00\x06%(time)s\x9f\x02\x00\x0c%(amount)s\x9f\x1b\x00\x040000\x9f\x0b\x00\x13622452*********2994\x9f\x0c\x00\x01S\x9f\x0f\x00\x00\x9f\x13\x00.FK:\xbb\xb7\xd3\xce\xd1\xc7\xcc\xab\xb3\xa9\xcf\xed\xd2\xf8\xc1\xaa\xd3\xc5\xbb\xdd\nZX:promo.unionpay.com\n\x9f\x1f\x00\x03%(posnum)s\x9f\x1a\x00\x00\x9f\xa1\x00\x01\x01\x9f\xa0\x00\x01\x01\x03\x00'% {b'amount': amount,b'date': date_,b'time': time_,b'posnum': posnum}
    b'\x02\x03\x13\x9f\x00\x00\x04NULL\x9f\x01\x00\x01\x02\x9f\x03\x00\x0c\xd6\xd0\xb9\xfa\xd2\xf8\xc1\xaa\xb2\xe2\xca\xd4\x9f\x04\x00\x0f307310083980007\x9f\x05\x00\x0814025520\x9f\x14\x00\x0200\x9f\x19\x00\x08\xbd\xbb\xd2\xd7\xb3\xc9\xb9\xa6\x9f\x07\x00\x0803072900\x9f\x08\x00\n0014243000\x9f\t\x00\x08\xbd\xf0\xbf\xa8\xd6\xd0\xd0\xc4\x9f\n\x00\x0800092900\x9f\r\x00\x06000145\x9f\x0e\x00\x06000081\x9f\x10\x00\x0c162246168268\x9f\x11\x00\x08%(date)s\x9f\x12\x00\x06%(time)s\x9f\x02\x00\x0c%(amount)s\x9f\x1b\x00\x040000\x9f\x0b\x00\x13622452*********2994\x9f\x0c\x00\x01S\x9f\x0f\x00\x00\x9f\x13\x00.FK:\xbb\xb7\xd3\xce\xd1\xc7\xcc\xab\xb3\xa9\xcf\xed\xd2\xf8\xc1\xaa\xd3\xc5\xbb\xdd\nZX:promo.unionpay.com\n\x9f\x1f\x00\x03%(posnum)s\x9f\x1a\x00\x00\x9f\xa1\x00\x01\x01\x9f\xa0\x00\x01\x01\x03\x00'

    Environment: Python 3.6.1 Windows 10 64bit.

    faa20c90-fcf0-43f4-810b-00286077d549 commented 7 years ago

    Test case minimization:

        Python 3.6.1 (default, Apr 24 2017, 06:18:27) 
        [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
        Type "help", "copyright", "credits" or "license" for more information.
        >>> b'a\x00%(a)s' % {b'a': b'a'}
        b'a\x00%(a)s'

    It seems that all formatting operations after a zero byte are ignored. This is because the code for parsing the format string (in _PyBytes_FormatEx in Objects/bytesobject.c) uses the following approach to find the next % character:

        while (--fmtcnt >= 0) {
            if (*fmt != '%') {
                Py_ssize_t len;
                char *pos;
                pos = strchr(fmt + 1, '%');

    But strchr uses the C notion of strings, which are terminated by a zero byte.

    faa20c90-fcf0-43f4-810b-00286077d549 commented 7 years ago

    This was already noted in bpo-29714 and fixed by Xiang Zhang in commit b76ad5121e2.