selfboot / AnnotatedShadowSocks

Annotated shadowsocks(python version)
Other
3 stars 1 forks source link

Details about bytes object #43

Open selfboot opened 7 years ago

selfboot commented 7 years ago

In python 2.x, string objects are overloaded. They serve to hold both sequences of characters and sequences of bytes. This overloading of purpose leads to confusion and bugs. In Python 3.x, string objects will be used for holding character data. The bytes object will fulfill the role of a byte container. Eventually the unicode type will be renamed to str and the old str type will be removed.

Bytes Object

A bytes object stores a mutable sequence of integers that are in the range 0 to 255. Unlike string objects, indexing a bytes object returns an integer. Assigning or comparing an object that is not an integer to an element causes a TypeError exception. Assigning an element to a value outside the range 0 to 255 causes a ValueError exception. The __len__() method of bytes returns the number of integers stored in the sequence (i.e. the number of bytes).

The constructor of the bytes object has the following signature:

bytes([initializer[, encoding]])

If no arguments are provided then a bytes object containing zero elements is created and returned. The initializer argument can be a string (in 2.6, either str or unicode), an iterable of integers, or a single integer.

The __repr__() method returns a string that can be evaluated to generate a new bytes object containing a bytes literal:

>>> bytes([10, 20, 30])
b'\n\x14\x1e'

The object has a decode() method equivalent to the decode() method of the str object. The object has a classmethod fromhex() that takes a string of characters from the set [0-9a-fA-F ] and returns a bytes object (similar to binascii.unhexlify).

>>> bytes.fromhex('5c5350ff')
b'\\SP\xff'
>>> bytes.fromhex('5c 53 50 ff')
b'\\SP\xff'

The leading \x escape sequence means the next two characters are interpreted as hex digits for the character code, so \xaa equals chr(0xaa), i.e., chr(16 * 10 + 10) -- a small raised lowercase 'a' character.

Ref
PEP 358 -- The "bytes" Object
What does a leading \x mean in a Python string \xaa
Print a string as hex bytes?

5

Python2.x 字符编码终极指南