What steps will reproduce the problem?
1. Escape "𡘾𦴩𥻂" with org.owasp.esapi.Encoder#encodeForHTML
2. View the result in a browser
What is the expected output? What do you see instead?
Expected: 𡘾𦴩𥻂
Current: ������
What version of the product are you using? On what operating system?
2.0.1 on Mac OS X 10.8.3
Does this issue affect only a specified browser or set of browsers?
It's the same in Chrome, Firefox and IE.
Please provide any additional information below.
The reason is that 32-bit characters do not fit in a Java char/Character. Here
some code to illustrate it:
String s = "𡘾𦴩𥻂";
// Wrong:
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
sb.append("&#x").append(Integer.toHexString(s.charAt(i))).append(';');
}
System.out.println(sb); // ������
// Correct:
sb = new StringBuilder();
for (int i = 0; i < s.length(); ) {
int codePoint = s.codePointAt(i);
sb.append("&#x").append(Integer.toHexString(codePoint)).append(';');
i += Character.charCount(codePoint);
}
System.out.println(sb); // 𡘾𦴩𥻂
Original issue reported on code.google.com by ri.j...@gmail.com on 4 Apr 2013 at 12:36
Original issue reported on code.google.com by
ri.j...@gmail.com
on 4 Apr 2013 at 12:36