selfboot / AnnotatedShadowSocks

Annotated shadowsocks(python version)
Other
3 stars 1 forks source link

Struct: working with binary data #37

Open selfboot opened 7 years ago

selfboot commented 7 years ago

The struct module includes functions for converting between strings of bytes and native Python data types such as numbers and strings. This can be used in handling binary data stored in files or from network connections, among other sources.

Packing and Unpacking

Structs support packing data into strings, and unpacking data from strings using format specifiers made up of characters representing the type of the data and optional count and endian-ness indicators.

struct.pack(fmt, v1, v2, ...)  
# Return a string containing the values v1, v2, ... packed according to the given format. The arguments must match the values required by the format exactly.  
struct.unpack(fmt, string)
# Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).

Functions vs. Struct Class

There are a set of module-level functions for working with structured values, and there is also the Struct class (new in Python 2.5). Format specifiers are converted from their string format to a compiled representation, similar to the way regular expressions are. The conversion takes some resources, so it is typically more efficient to do it once when creating a Struct instance and call methods on the instance instead of using the module-level functions.

Format Strings

Format strings are the mechanism used to specify the expected layout when packing and unpacking data. They are built up from Format Characters, which specify the type of data being packed/unpacked. In addition, there are special characters for controlling the Byte Order, Size, and Alignment.

Alternatively, the first character of the format string can be used to indicate the byte order, size and alignment of the packed data, according to the following table:

image

If the first character is not one of these, '@' is assumed.

Format characters have the following meaning; the conversion between C and Python values should be obvious given their types. The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of '<', '>', '!' or '='. When using native size, the size of the packed value is platform-dependent.

image

Demo

>>> struct.pack(">i",34)
'\x00\x00\x00"'
>>> struct.pack("<i",34)
'"\x00\x00\x00'

Looks strange, expected Output:

>>> struct.pack(">i",34)
'\x00\x00\x00\x22'
>>> struct.pack("<i",34)
'\x22\x00\x00\x00'

This is because the output is returned as a byte string, and Python will print such strings as ASCII characters whenever possible:

>>> ord('"')
34
>>> hex(ord('"'))
'0x22'

We can converts the packed value to a sequence of hex bytes for printing with binascii.hexlify()

>>> demo = struct.pack(">i",34)
>>> binascii.hexlify(demo)
'00000022'
>>> demo = struct.pack("<i",34)
>>> binascii.hexlify(demo)
'22000000'

Ref
struct – Working with Binary Data
struct — Interpret strings as packed binary data
Using struct pack in python