I've tried using the websocket.ksy format file in Python and i noticed that masked packets raise an UnicodeDecodeError exception when parsed.
I used the examples section (5.7) from RFC 6455 for the WebSocket protocol.
Example code:
# A single-frame masked text message (contains "Hello")
raw_msg = bytes((0x81, 0x85, 0x37, 0xfa, 0x21, 0x3d, 0x7f, 0x9f, 0x4d, 0x51, 0x58))
ws_msg = Websocket.from_bytes(raw_msg) # Exception raised here
Traceback of the exception:
Traceback (most recent call last):
File "/home/ben/Code/python/kaitai-struct-websocket/main.py", line 6, in <module>
ws_msg = Websocket.from_bytes(raw_msg)
File "/home/ben/.local/lib/python3.10/site-packages/kaitaistruct.py", line 43, in from_bytes
return cls(KaitaiStream(BytesIO(buf)))
File "/home/ben/Code/python/kaitai-struct-websocket/websocket.py", line 39, in __init__
self._read()
File "/home/ben/Code/python/kaitai-struct-websocket/websocket.py", line 42, in _read
self.initial_frame = Websocket.InitialFrame(self._io, self, self._root)
File "/home/ben/Code/python/kaitai-struct-websocket/websocket.py", line 92, in __init__
self._read()
File "/home/ben/Code/python/kaitai-struct-websocket/websocket.py", line 100, in _read
self.payload_text = (self._io.read_bytes(self.header.len_payload)).decode(u"UTF-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9f in position 1: invalid start byte
The format doesn't take special care of masked messages as they can't be parsed as UTF-8 strings.
I also see the format doesn't attempt to use the xor process with the key after which it should be possible to parse the data as a string. I tried finding a way to incorporate it but as far as I understand, currently there is no way to make an optional process routine (based on the is_masked header section).
I've tried using the websocket.ksy format file in Python and i noticed that masked packets raise an UnicodeDecodeError exception when parsed. I used the examples section (5.7) from RFC 6455 for the WebSocket protocol.
Example code:
Traceback of the exception:
The format doesn't take special care of masked messages as they can't be parsed as UTF-8 strings. I also see the format doesn't attempt to use the xor process with the key after which it should be possible to parse the data as a string. I tried finding a way to incorporate it but as far as I understand, currently there is no way to make an optional process routine (based on the is_masked header section).