python / cpython

The Python programming language
https://www.python.org/
Other
61.09k stars 29.49k forks source link

Octal byte literals with a decimal value > 255 are silently truncated #78801

Open df79943f-4aee-4531-a00d-c6b12816eb70 opened 5 years ago

df79943f-4aee-4531-a00d-c6b12816eb70 commented 5 years ago
BPO 34620
Nosy @pfmoore, @tjguk, @zware, @zooba, @mr-nfamous

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['interpreter-core', 'type-bug', '3.7', 'OS-windows'] title = 'Octal byte literals with a decimal value > 255 are silently truncated' updated_at = user = 'https://github.com/mr-nfamous' ``` bugs.python.org fields: ```python activity = actor = 'bup' assignee = 'none' closed = False closed_date = None closer = None components = ['Interpreter Core', 'Windows'] creation = creator = 'bup' dependencies = [] files = [] hgrepos = [] issue_num = 34620 keywords = [] message_count = 1.0 messages = ['324918'] nosy_count = 5.0 nosy_names = ['paul.moore', 'tim.golden', 'zach.ware', 'steve.dower', 'bup'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue34620' versions = ['Python 3.6', 'Python 3.7'] ```

df79943f-4aee-4531-a00d-c6b12816eb70 commented 5 years ago
>>> b'\542\571\564\545\563', b'\142\171\164\145\163'
(b'bytes', b'bytes')

All the C compilers I know of at the very least generate a warning when one tries to assign an oct literal >= '\400' to a byte. And that's because it's nonsense when bytes have 8 bits, even more so for an 8 bit byte string.

The literal value:

>> b'\542\571\564\545\563'

should be identical to:

>> bytes([0o542, 0o571, 0o564, 0o545, 0o563])

That obviously doesn't work:

>>> b'\542\571\564\545\563' == bytes([0o542, 0o571, 0o564, 0o545, 0o563])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: bytes must be in range(0, 256)

This is on Windows/Intel. I haven't looked at the parser in much detail, but I wonder what would happen on a big-endian system?