michael-simons / java-autolinker

An extendable autolinking library
3 stars 5 forks source link

Test issue #1

Closed ShepBook closed 11 years ago

ShepBook commented 11 years ago

Imma just put this here...

 import org.jsoup.Jsoup;
 import org.jsoup.nodes.Document;
 import org.jsoup.nodes.Entities.EscapeMode;
 import org.junit.Test;

 public class UF84bytes {
    @Test
    public void blah() {
        String s = "å  😄";
        Document d = Jsoup.parseBodyFragment(s);
        d.outputSettings().charset("UTF-8").escapeMode(EscapeMode.xhtml);
        System.out.println(d.html());
    }
 }

i expect this output

 <html>
  <head></head>
  <body>
   å 😄
  </body>
 </html>

but i get

 <html>
  <head></head>
  <body>
   å &#55357;&#56836;
  </body>
 </html>

And how about some without backticks?

å 😄

ShepBook commented 11 years ago

Comments?

å 😄

å 😄

ShepBook commented 11 years ago

posting comment from safari

 import org.jsoup.Jsoup;
 import org.jsoup.nodes.Document;
 import org.jsoup.nodes.Entities.EscapeMode;
 import org.junit.Test;

 public class UF84bytes {
    @Test
    public void blah() {
        String s = "å  😄";
        Document d = Jsoup.parseBodyFragment(s);
        d.outputSettings().charset("UTF-8").escapeMode(EscapeMode.xhtml);
        System.out.println(d.html());
    }
 }

i expect this output

 <html>
  <head></head>
  <body>
   å 😄
  </body>
 </html>

but i get

 <html>
  <head></head>
  <body>
   å &#55357;&#56836;
  </body>
 </html>

å 😄

michael-simons commented 11 years ago

This is what the issue and comment looks like that you posted:

Broken 4byte UTF8

This is the correct content of the test case, viewed in TextEdit on Mac OS X 10.8

Correct 4byte UTF8

Just goto http://en.wikipedia.org/wiki/Emoji choose one of the chars and try to post it. It will silently fail you. Maybe Emojis are not important but i guess other languages have characters in the utf 8 supplementary range as well.