When Python's bytes.decode() method encounters encounters a byte sequence
that cannot be decoded, it will take an action dependent on its second argument:
'strict': raise UnicodeDecodeError exception (default)
'replace': insert U+FFFD
'ignore': skip to next character
While most inputs appear to be (mostly) sanitized, dmesg output is passed to
_parse_dmesg() as is, and can contain data that escapes to invalid Unicode.
When the parser attempts to decode this data it immediately raises an exception
and dies, as seen in https://github.com/xrmx/bootchart/issues/77.
To prevent this issue, I set the error handling method to 'replace', as 'ignore' can
hide decoding errors from developers working with really broken dmesg logs.
The parts of dmesg the parser looks at should be in a standard format anyway,
so a U+FFFD (Replacement Character) or two after the timestamp shouldn't be
too harmful.
When Python's bytes.decode() method encounters encounters a byte sequence that cannot be decoded, it will take an action dependent on its second argument: 'strict': raise UnicodeDecodeError exception (default) 'replace': insert U+FFFD 'ignore': skip to next character
While most inputs appear to be (mostly) sanitized, dmesg output is passed to _parse_dmesg() as is, and can contain data that escapes to invalid Unicode. When the parser attempts to decode this data it immediately raises an exception and dies, as seen in https://github.com/xrmx/bootchart/issues/77.
To prevent this issue, I set the error handling method to 'replace', as 'ignore' can hide decoding errors from developers working with really broken dmesg logs. The parts of dmesg the parser looks at should be in a standard format anyway, so a U+FFFD (Replacement Character) or two after the timestamp shouldn't be too harmful.