haku / Onosendai

A multi-column social network client for Android
Apache License 2.0
18 stars 11 forks source link

Skin tone modifiers should not be counted when positioning URLs #174

Closed haku closed 8 years ago

haku commented 9 years ago

This tweet https://twitter.com/tintinfp/status/654805685645873153 contains the string:

πŸ’ͺ🏼πŸ’ͺ🏼πŸ’ͺ🏼

Which in JSON looks like:

"\ud83d\udcaa\ud83c\udffc\ud83d\udcaa\ud83c\udffc\ud83d\udcaa\ud83c\udffc"

It should be decoded as 6 characters:

\ud83d\udcaa
\ud83c\udffc
\ud83d\udcaa
\ud83c\udffc
\ud83d\udcaa
\ud83c\udffc

But then rendered as 3.

Even Twitter's website, which clears knows about skin tone modifiers, renders this wrong, with the modifier as its own block instead of altering the preceding character. screenshot from 2015-10-18 22 13 54

The API reports the character offset of the URL as if each \ud83c\udffc does not exist, i.e. this is 3 characters long:

"\ud83d\udcaa\ud83c\udffc\ud83d\udcaa\ud83c\udffc\ud83d\udcaa\ud83c\udffc"

OS / Android does not know about skin tone modifiers, and does not even seem to know these are unicode characters at all, and thus counts each \ud83c\udffc as two characters and positioning the URL 6 characters too far to the left and splatting the preceding text.

haku commented 9 years ago

Too add to the fun...

public class A {
  public static void main(String[] args) {
    String a = "πŸ’ͺ🏼πŸ’ͺ🏼πŸ’ͺ🏼";
    System.out.println(a);
    System.out.println(a.length());

    String b = "です";
    System.out.println(b);
    System.out.println(b.length());
  }
}

On my desktop (java version "1.7.0_79") outputs:

$ java A       
πŸ’ͺ🏼πŸ’ͺ🏼πŸ’ͺ🏼
12
です
2

Which is going to make writing a JUnit test for this so much fun.

haku commented 9 years ago

May be related: http://stackoverflow.com/a/969200/332868

haku commented 8 years ago

How about just gsub'ing the t.co URL? (Thanks @rvedotrc)

stuarthicks commented 8 years ago

Possible occurrence/variation of this issue in this tweet which renders correctly on Web and iOS like this but on Onosendai looks like this.

haku commented 8 years ago

Trying find and replace approach instead of using offset indexes. And while messing with the same code, made it remote image URLs like Twitter does.

haku commented 8 years ago

Seems to be fixed ok. Closing.