Multi-byte characters are not taken into account?

The problem occurs when reading fixed-length data that contains a mixture of multibyte and single-byte characters. Specifying the encoding does not seem to make any difference.

Below is an example code.

FixedWidthFields fields = new FixedWidthFields();
fields.addField("lookahead", 1);
fields.addField("dataString", 6);
fields.addField("date", 8);

FixedWidthParserSettings settings = new FixedWidthParserSettings();
settings.addFormatForLookahead("1", fields);
FixedWidthParser parser = new FixedWidthParser(settings);

byte[] ms932Bytes = "1あああ20201218".getBytes(Charset.forName("MS932"));
ByteArrayInputStream bais = new ByteArrayInputStream(ms932Bytes);
parser.beginParsing(bais, Charset.forName("MS932"));

Record record = parser.parseNextRecord();

Expect

1, あああ, 20201218

Actual

1, あああ202, 01218

My guess is that 'あ' is 2bytes in MS932, but it is actually counted as 1byte. Would you like to be able to count multibyte characters correctly?

(I'm sorry, this is a machine-translated sentence, so it may sound strange.)

uniVocity / univocity-parsers

Multi-byte characters are not taken into account? #436

Expect

Actual