vsch / flexmark-java

CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.
BSD 2-Clause "Simplified" License
2.26k stars 269 forks source link

Issue parsing Facebook img/emojis #548

Closed AliLezamaIgrat closed 1 year ago

AliLezamaIgrat commented 1 year ago

Hey guys, first of all excellent work with this lib have been very useful for my team, for quite some time.

I'm currently facing an issue while trying to parse a simple HTML text:

<div>
<div>This is my test to a Facebook emoji:</div>
<div><img src="https://static.xx.fbcdn.net/images/emoji.php/v9/t71/2/16/1f967.png" alt="text" width="24" height="24"></div>
</div>

Please provide as much information about where the but is located or what you were using:

To Reproduce

The actual code I'm using is from the version 0.40.16, but I also tried using update 0.50.50 and got the same result. Then took a look to your repo and it seems issue is still there even if I update to latest version

The error log we're getting is:

stack_trace:java.lang.NullPointerException: null
    at com.vladsch.flexmark.util.html.FormattingAppendableImpl.append(FormattingAppendableImpl.java:668) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processImg(FlexmarkHtmlParser.java:1026) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processElement(FlexmarkHtmlParser.java:519) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:489) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processSpan(FlexmarkHtmlParser.java:1452) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processElement(FlexmarkHtmlParser.java:533) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:489) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:470) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processDiv(FlexmarkHtmlParser.java:1411) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processElement(FlexmarkHtmlParser.java:515) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:489) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:470) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processDiv(FlexmarkHtmlParser.java:1411) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processElement(FlexmarkHtmlParser.java:515) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:489) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:470) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processDiv(FlexmarkHtmlParser.java:1411) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processElement(FlexmarkHtmlParser.java:515) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:489) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.processHtmlTree(FlexmarkHtmlParser.java:470) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.parse(FlexmarkHtmlParser.java:326) ~[na:na]
    at com.vladsch.flexmark.convert.html.FlexmarkHtmlParser.parse(FlexmarkHtmlParser.java:405) ~[na:na]
    at com.en.util.StringUtils.toMarkdown(StringUtils.java:1225) ~[na:na]

And what seems to be causing this error is the validation being made here, since is trying to use the property shortcut, when this property is being null when getting the info for the emoji. https://github.com/vsch/flexmark-java/blob/8a881b73109a287b5f202e2e1fc9f9c497d5eecf/flexmark-html2md-converter/src/main/java/com/vladsch/flexmark/html2md/converter/internal/HtmlConverterCoreNodeRenderer.java#L661-L663

We currently add a validation before we use your lib parser, so we can get rid of those NullPointers.

 if (emoji != null && emoji.shortcut == null) {
      Element emojiDiv = el.parent().appendElement("div");
      final String emojiCode = emoji.unicodeSampleFile.split("[.]")[0];
      emojiDiv.append("&#x" + emojiCode + ";");
      el.remove();
}

If you could point to the right direction if something is not being configure properly to handle these emojis or add an specific case to handle this NullPointers would be appreciated.

DamnedElric commented 1 year ago

Noticed that someone else created the issue as I was fixing the bug. PR should fix the issue. The fix simply falls back to regular image rendering in case the emoji does not have a corresponding shortcut.