xrmx / bootchart

merge of bootchart-collector and pybootchartgui
GNU General Public License v2.0
233 stars 87 forks source link

Make dmesg parsing non-UTF-tolerant #78

Closed AzureCrimson closed 3 years ago

AzureCrimson commented 6 years ago

When Python's bytes.decode() method encounters encounters a byte sequence that cannot be decoded, it will take an action dependent on its second argument: 'strict': raise UnicodeDecodeError exception (default) 'replace': insert U+FFFD 'ignore': skip to next character

While most inputs appear to be (mostly) sanitized, dmesg output is passed to _parse_dmesg() as is, and can contain data that escapes to invalid Unicode. When the parser attempts to decode this data it immediately raises an exception and dies, as seen in https://github.com/xrmx/bootchart/issues/77.

To prevent this issue, I set the error handling method to 'replace', as 'ignore' can hide decoding errors from developers working with really broken dmesg logs. The parts of dmesg the parser looks at should be in a standard format anyway, so a U+FFFD (Replacement Character) or two after the timestamp shouldn't be too harmful.

xrmx commented 3 years ago

Thanks!