meyers8686 / plist

Automatically exported from code.google.com/p/plist
0 stars 0 forks source link

Problem working with emoji characters #22

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

iOS 5 now allows the use emoji characters, which are 4-byte unicode characters. 
 These don't seem to parse correctly.

1. Create a binary plist with string value of "Test πŸ˜ƒ"
2. Parse
3. While the string seems to get parsed correctly, trying to save it to MySql 
fails with "Incorrect string value: '\xF0\x9F\x98\x83' for column 'content' at 
row 1"
4. If I try saving hard-coded "Test πŸ˜ƒ" directly, rather than getting it from 
the plist, I am able to save w/o any issues.

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?
r55.

Please provide any additional information below.

Original issue reported on code.google.com by roma...@gmail.com on 30 Nov 2011 at 4:48

GoogleCodeExporter commented 9 years ago
I could not reproduce any problems with the parsing and handling of Emoji 
characters in property lists. I generated property lists containing Emoji 
characters using XCode's Property List Editor. They parsed just fine and the 
given strings are equal to the expected strings containing emoji. Emoji are 
basically no different from other unicode characters, which so far are known to 
parse fine.

Please re-examine the code you use to store the string into your MySQL table.

Original comment by daniel.dreibrodt on 30 Nov 2011 at 9:09

GoogleCodeExporter commented 9 years ago
Are you using a binary plist?  I've attached a binary plist file.  When I parse 
it and run this I get "NOT EQUAL":

        File file = new File("test.plist");
        byte[] plistData = FileUtils.readFileToByteArray(file);
        NSDictionary rootDict = (NSDictionary) PropertyListParser.parse(plistData);
        String content = rootDict.objectForKey("content").toString();
        String emojiContent = "TestπŸ˜ƒ";
        if (content.equals(emojiContent)) {
            log.info("EQUALS!");
        } else {
            log.info("NOT EQUAL!");
        }

Original comment by roma...@gmail.com on 1 Dec 2011 at 1:13

Attachments:

GoogleCodeExporter commented 9 years ago
I have tested binary and xml property lists with Emojis. You can see the code I 
used in the source code for r58.

Now I also tested your file and it worked out just fine.  But I didn't use 
FileUtils, I just passed the file as an argument to the parser. So I guess the 
problem comes from the FileUtils class.

Original comment by daniel.dreibrodt on 1 Dec 2011 at 8:48

GoogleCodeExporter commented 9 years ago
I now downloaded Apache Commons IO v.2.1. and now did your test also with the 
FileUtils class. But again the two strings were equal. 

Maybe you saved the source code file in such an encoding, that the emoji 
characters are corrupted. Could you attach your original test java file?

Because other than that, I cannot see any way this error could occur.

Original comment by daniel.dreibrodt on 1 Dec 2011 at 9:00

GoogleCodeExporter commented 9 years ago
Very strange.  I tried saving the source file using Intellij IDEA and Xcode, 
same result.  By the way, in my wider test, if I save the hard-coded text into 
MySql and then retrieve it on another iPhone client, it seems to work 
correctly, which kind of proves that the hard coded text is encoded correctly.

Anyway, here is a little program that fails for me (I also removed FileUtils):

java -classpath .:./dd-plist.jar Emoji

Original comment by roma...@gmail.com on 1 Dec 2011 at 3:11

Attachments:

GoogleCodeExporter commented 9 years ago
The java file in the tar.gz was not encoded in UTF-8 but in MacRoman, which is 
the default encoding for many Mac programs. But MacRoman does not support Emoji.

Save the java file in UTF-8 format (for example with TextEdit or TextWrangler) 
and also run the compiler with the flag: -encoding utf8 (Because if you don't 
do that Java will assume the default encoding, MacRoman and your string will be 
messed up)

If you follow these steps the program will confirm that the two strings are 
equal.

But this doesn't help with the problem with MySQL I'd guess. Because there you 
used the strings from the plist library and they were correct UTF-8 or UTF-16. 
Most probably your version of MySQL does not support the 4 byte Emoji 
characters.

Original comment by daniel.dreibrodt on 1 Dec 2011 at 7:04

GoogleCodeExporter commented 9 years ago
A quick google search proved that others also had that problem with MySQL. Not 
all versions of it support 4 byte unicode.
http://mzsanford.wordpress.com/2010/12/28/mysql-and-unicode/
http://stackoverflow.com/questions/7814293/how-to-insert-utf-8-mb4-characteremoj
i-in-ios5-in-mysql
http://forums.mysql.com/read.php?103,434779,434779#msg-434779

Original comment by daniel.dreibrodt on 1 Dec 2011 at 7:16

GoogleCodeExporter commented 9 years ago
I've already fixed MySql issues, that's what my comment was about :-)

Here is a new version, I created two text files using TextEdit and TextWrangler 
and now compare what I read from the two files against the parsed plist.  Same 
issue for me :(

By the way, thanks for working with me on this!

Original comment by roma...@gmail.com on 1 Dec 2011 at 7:28

Attachments:

GoogleCodeExporter commented 9 years ago
I actually meant saving the java file with TextWrangler :D

Try the following: Scanner scanner = new Scanner(new 
FileInputStream("TextEdit.txt"), "UTF-8");

Problem solved.

Original comment by daniel.dreibrodt on 1 Dec 2011 at 7:48

GoogleCodeExporter commented 9 years ago
Hmm... confirmed.  Sounds like you're off the hook :-)

Thanks again for all your help!

-Alex

Original comment by roma...@gmail.com on 1 Dec 2011 at 7:52