Make dmesg parsing non-UTF-tolerant

When Python's bytes.decode() method encounters encounters a byte sequence that cannot be decoded, it will take an action dependent on its second argument: 'strict': raise UnicodeDecodeError exception (default) 'replace': insert U+FFFD 'ignore': skip to next character

While most inputs appear to be (mostly) sanitized, dmesg output is passed to _parse_dmesg() as is, and can contain data that escapes to invalid Unicode. When the parser attempts to decode this data it immediately raises an exception and dies, as seen in https://github.com/xrmx/bootchart/issues/77.

To prevent this issue, I set the error handling method to 'replace', as 'ignore' can hide decoding errors from developers working with really broken dmesg logs. The parts of dmesg the parser looks at should be in a standard format anyway, so a U+FFFD (Replacement Character) or two after the timestamp shouldn't be too harmful.

xrmx / bootchart

Make dmesg parsing non-UTF-tolerant #78