thanhlong203 / closure-library

Automatically exported from code.google.com/p/closure-library
0 stars 0 forks source link

Character encoding issue with goog.crypt.stringToByteArray #352

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. encode a Unicode string containing CJK characters using 
goog.crypt.stringToByteArray
2. decode the byte array thus produced using goog.crypt.byteArrayToString

Observe that the original string is mangled.

The encoding used in stringToByteArray is non-injective and thus cannot be 
reversed properly. For example it will encode both of the following two strings 
as [0x63, 0x41]:
 - "cA"
 - "䅣" (HTML 䅣 - unicode code point 0x4163)

This should be documented, at a minimum.

Original issue reported on code.google.com by bdebours...@google.com on 27 Jul 2011 at 10:29

GoogleCodeExporter commented 8 years ago
This is valid. The expected input is not well specified, and can pull multiple 
ints from a single char. 

It would be best to make a stricter version that expects charcodes between 0 
and 255, and a unicode one that is not reversible.

Original comment by nn...@google.com on 9 May 2012 at 10:52