nanosai / rion-ops-java

RION Ops for Java is a toolkit for reading and writing RION encoded data. RION is a compact, fast binary data format.
43 stars 7 forks source link

UTF8 not working correctly #3

Closed hvbtup closed 4 years ago

hvbtup commented 4 years ago

While testing I noticed that UTF8 is not working correctly. The reader uses just the new String(byte[]) constructor without specifying the encoding.

The solution is obvious: I had to change several places in the RionReader and RionWriter code.

Just to give one example:

public String readKeyShortAsUtf8String(){
    if(fieldLengthLength == 0) {
        return null;
    }

    return new String(source, index, fieldLength, UTF8_CHARSET);
}

Just look for new String(...) calls in the code.

In the reader, look for occurences of utf8Bytes. I simplified these like this (for example, inside the writeUtf8(String value) method):

byte[] utf8Bytes = value.getBytes(UTF8_CHARSET);
jjenkov commented 4 years ago

Strange... I am fully aware that the charset must be explicitly set when creating a Java String from UTF-8 encoded bytes... I wonder why I would have forgotten to do that... ?!?

Anyways, it's an easy fix, and I will get it done soon!

jjenkov commented 4 years ago

This has been fixed now - in version 0.5.3

jjenkov commented 4 years ago

v. 0.5.3 has now been released