skjolber / xswi

The simple, standalone XML Stream Writer for iOS
30 stars 5 forks source link

emoji character issues #8

Closed Amitmundra closed 8 years ago

Amitmundra commented 9 years ago

I am trying to send emoji character through xml, I am generating xml through xml write but, When XMlWriter converting, then emoji converted into spaces. How can i resolved this issue?

skjolber commented 9 years ago

Have you tried getting the xml as text and saving with another charset?

Amitmundra commented 9 years ago

yes, i tried. look into writeEscape method in xmlwriter.m class, From writeEscape method we are calling writeEscapeCharacters method, In this method we are ignoring the Emoji charecters, thats-why we are not getting emoji. Do you have any idea to resolve this issue?

Many thanks to give your valuable time.

skjolber commented 9 years ago

Do you have a concrete test-case which exposes the problem? With code, output and expected output..

Amitmundra commented 9 years ago

I have written a value in an xml attribute like this "skjolber 😀 " , when i print the xml using xmlwriter.toString() then getting the value like this "skjolber ". It is ignoring the Emoji and replacing it by space.

I have written the value in Fname value like "skjolber 😀 ", But when i write then it print the xml without emoji. As i told you in my above statement Emoji's are skipped in writeEscapeCharacters method,.

Example xml

                <ColumnValues>
                    <Name="FName" Value="skjolber   " />
                    <Name="LName" Value="SKJ " />
                </ColumnValues>

Again thanks to give your valuable time.

qiulang commented 9 years ago

I think the reason is because XMLWriter uses UTF-16 (UniChar), but the rule UTF-16 checks for invalid character doesn't work for emoji.

In UTF-16 U+D800 to U+DFFF are reserved for UTF-16 encoding of the high and low surrogates, I guess that's why the code said, ... else if (c < 0xE000) { // invalid, skip }

But for emoji characters, their code points start from U+1F30X, https://en.wikipedia.org/wiki/Emoji If we convert these code point to UFT-16, https://en.wikipedia.org/wiki/UTF-16 Then their high value will be at least 0xD800+ 0x0036 (by spliting the that "F" ) = 0xD836 their low value will be at least 0xDC00+0 (e.g. 1F400) = 0XDC00

To fix it I will suggest maybe we just remove that check (< 0xE000) b/c the only invalid xml XML characters are >, < &, " and you already escape them.

skjolber commented 9 years ago

Can you make a push request with the suggested change? I do not have access to a mac at the moment.