Traditionally, data files are a mixture of various character sets. The parser expects ISO-8859-1 decoded input as it is the most used character set on data files and can easily be reinterpreted without loss (plain single-byte 8-bit characters, fully defined). Some client lines were already encoded differently in the past, for example in UTF-8 or KOI-8.
Currently (April 2020) that mixture appears to have gone worse by the whole data file now also being encoded in UTF-8 leading to a mess of multiple encodings layered on top of each other. This appears to be a good time to have another look into automatic character set detection and re-interpretation both on file-level (restoring original encodings) and on individual lines (introducing compatibility with UTF-8, KOI-8 etc.).
Traditionally, data files are a mixture of various character sets. The parser expects ISO-8859-1 decoded input as it is the most used character set on data files and can easily be reinterpreted without loss (plain single-byte 8-bit characters, fully defined). Some client lines were already encoded differently in the past, for example in UTF-8 or KOI-8.
Currently (April 2020) that mixture appears to have gone worse by the whole data file now also being encoded in UTF-8 leading to a mess of multiple encodings layered on top of each other. This appears to be a good time to have another look into automatic character set detection and re-interpretation both on file-level (restoring original encodings) and on individual lines (introducing compatibility with UTF-8, KOI-8 etc.).