selfboot / AnnotatedShadowSocks

Annotated shadowsocks(python version)
Other
3 stars 1 forks source link

bytes in python 2.7+ and 3.3+ #6

Open selfboot opened 7 years ago

selfboot commented 7 years ago

In function: compat_chr

def compat_chr(d):
    if bytes == str:
        return _chr(d)
    return bytes([d])

And in fucntion: to_bytes, to_str

def to_bytes(s):
    if bytes != str:
        if type(s) == str:
            return s.encode('utf-8')
    return s

def to_str(s):
    if bytes != str:
        if type(s) == bytes:
            return s.decode('utf-8')
    return s

We see bytes == str, but what does bytes mean? What's the difference between 2.x and 3.x?

Python 2.x

Python 2.6 adds bytes as a synonym for the str type, and it also supports the b'' notation. The 2.6+ bytes built-in is just an alias to the str type. There is no new type called bytes in 2.x; Just a new alias and literal syntax for str.

It exists to help writing portable code between Python 2 and 3. In 2.6+

>>> bytes
<type 'str'>
>>> bytes == str
True
>>> bytes is str
True

Python 3.x

The new bytes type is 3.x only. According to doc:

Bytes objects are immutable sequences of single bytes. Only ASCII characters are permitted in bytes literals (regardless of the declared source code encoding). Any binary values over 127 must be entered into bytes literals using the appropriate escape sequence. The syntax for bytes literals is largely the same as that for string literals, except that a b prefix is added:

Built-in function bytes return a new “bytes” object, which is an immutable sequence of integers in the range 0 <= x < 256. bytes is an immutable version of bytearray – it has the same non-mutating methods and the same indexing and slicing behavior.

class bytes([source[, encoding[, errors]]])

In python 3.3+:

>>> bytes is str
False
>>> bytes == str
False
>>> type(bytes)
<class 'type'>
>>> bytes([11, 65, 128])
b'\x0bA\x80'

Ref
the bytes type in python 2.7 and PEP-358
https://docs.python.org/3/whatsnew/2.6.html#pep-3112-byte-literals

selfboot commented 6 years ago

While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256 (attempts to violate this restriction will trigger ValueError).

In addition to the literal forms, bytes objects can be created in a number of other ways:

>>> bytes([234])
b'\xea'
>>> bytes([284])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: bytes must be in range(0, 256)

Accordingly, the bytes type has an additional class method to read data in that format:

fromhex(string)

This bytes class method returns a bytes object, decoding the given string object. The string must contain two hexadecimal digits per byte, with ASCII whitespace being ignored.

>>> hex_str = '91180100000100000000000006676f6f676c6503636f6d0000010001'
>>> bytes.fromhex('91180100000100000000000006676f6f676c6503636f6d0000010001')
b'\x91\x18\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x06google\x03com\x00\x00\x01\x00\x01'

A reverse conversion function exists to transform a bytes object into its hexadecimal representation.

hex()

Return a string object containing two hexadecimal digits for each byte in the instance.

>>> bytes_demo = bytes.fromhex('91180100000100000000000006676f6f676c6503636f6d0000010001')
>>> bytes_demo
b'\x91\x18\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x06google\x03com\x00\x00\x01\x00\x01'
>>> bytes_demo.hex()
'91180100000100000000000006676f6f676c6503636f6d0000010001'
selfboot commented 6 years ago

Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1)

The representation of bytes objects uses the literal format (b'...') since it is often more useful than e.g. bytes([46, 46, 46]). You can always convert a bytes object into a list of integers using list(b).

>>> b = bytes([31,32,33,34,35])
>>> b
b'\x1f !"#'
>>> b[0]
31
>>> b[1]
32
>>> b[1:3]
b' !'
>>> list(b)
[31, 32, 33, 34, 35]