Closed LillianZ closed 4 years ago
Assuming the encoding of these characters is the same in Ruby, the GB18030 bytes appear to match:
[] ~/projects/jruby $ ruby -e 'p "Lašas".encode("gb18030").bytes'
[76, 97, 129, 48, 148, 56, 97, 115]
I do not have an explanation for why the Java GB18030 encoder produces different output.
Based on this online converter, we also match:
$ ruby -e 'p "Lašas".encode("gb18030").bytes.map{|i| i.to_s(16)}'
["4c", "61", "81", "30", "94", "38", "61", "73"]
I would say the Java encoder is in error here.
Actually now I see that the Java getBytes
matches Ruby but the manually transcoded result is not correct in your example.
I made this into a test class and I believe the latest jcodings should match. Perhaps you are running against an old version?
$ java -cp ../jcodings/target/jcodings.jar:. Blah
[76, 97, -127, 48, -108, 56, 97, 115]
[76, 97, -127, 48, -108, 56, 97, 115]
import org.jcodings.*;
import org.jcodings.transcode.*;
import java.util.*;
public class Blah {
public static void main(String[] args) throws Throwable {
EConv econv = TranscoderDB.open("UTF-8", "gb18030", 0);
byte[] src = "Lašas".getBytes("UTF-8");
byte[] dest = new byte["Lašas".getBytes("gb18030").length];
econv.convert(src, new Ptr(0), 6, dest, new Ptr(0), dest.length, 0);
System.out.println(Arrays.toString(dest));
// [76, 97, -127, 48, 18, 56, 97, 115]
System.out.println(Arrays.toString("Lašas".getBytes("gb18030")));
// [76, 97, -127, 48, -108, 56, 97, 115]
}
}
Possibly fixed by @k77ch7 in 408210ce852febb2959f2bcdc460f2c91c195117. In any case, it's no longer broken.
Yes, I was using an old version, thanks for helping me debug!
Should the last two lines should be equal? Thanks!