kaitai-io / kaitai_struct_formats

Kaitai Struct: library of binary file formats (.ksy)
http://formats.kaitai.io
712 stars 203 forks source link

Masked WebSocket message throwing exception #615

Open benpbo opened 2 years ago

benpbo commented 2 years ago

I've tried using the websocket.ksy format file in Python and i noticed that masked packets raise an UnicodeDecodeError exception when parsed. I used the examples section (5.7) from RFC 6455 for the WebSocket protocol.

Example code:

# A single-frame masked text message (contains "Hello")
raw_msg = bytes((0x81, 0x85, 0x37, 0xfa, 0x21, 0x3d, 0x7f, 0x9f, 0x4d, 0x51, 0x58))
ws_msg = Websocket.from_bytes(raw_msg) # Exception raised here

Traceback of the exception:

Traceback (most recent call last):
  File "/home/ben/Code/python/kaitai-struct-websocket/main.py", line 6, in <module>
    ws_msg = Websocket.from_bytes(raw_msg)
  File "/home/ben/.local/lib/python3.10/site-packages/kaitaistruct.py", line 43, in from_bytes
    return cls(KaitaiStream(BytesIO(buf)))
  File "/home/ben/Code/python/kaitai-struct-websocket/websocket.py", line 39, in __init__
    self._read()
  File "/home/ben/Code/python/kaitai-struct-websocket/websocket.py", line 42, in _read
    self.initial_frame = Websocket.InitialFrame(self._io, self, self._root)
  File "/home/ben/Code/python/kaitai-struct-websocket/websocket.py", line 92, in __init__
    self._read()
  File "/home/ben/Code/python/kaitai-struct-websocket/websocket.py", line 100, in _read
    self.payload_text = (self._io.read_bytes(self.header.len_payload)).decode(u"UTF-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9f in position 1: invalid start byte

The format doesn't take special care of masked messages as they can't be parsed as UTF-8 strings. I also see the format doesn't attempt to use the xor process with the key after which it should be possible to parse the data as a string. I tried finding a way to incorporate it but as far as I understand, currently there is no way to make an optional process routine (based on the is_masked header section).