marp-team / marp-core

The core of Marp converter
MIT License
777 stars 130 forks source link

Emoji Rendering Discrepancy Between Inline and Block Elements #309

Open bzaczynski opened 2 years ago

bzaczynski commented 2 years ago

Version of Marp Tool

v1.0.0

Operating System

Linux

Environment

Running in a Docker container. (The latest version 2.0.4 seems to suffer from the same issue.)

How to reproduce

Create the following Markdown file named slide-deck.md:

---
---

<span>Inline: &#128578;</span>

<div>Block: &#128578;</div>

Run the following CLI command with Docker to generate HTML:

$ docker run --rm -v $PWD:/home/marp/app/ -e LANG=$LANG -e MARP_USER="$(id -u):$(id -g)" marpteam/marp-cli:v1.0.0 slide-deck.md

Run the following CLI command with Docker to generate PDF:

$ docker run --rm -v $PWD:/home/marp/app/ -e LANG=$LANG -e MARP_USER="$(id -u):$(id -g)" marpteam/marp-cli:v1.0.0 slide-deck.md --pdf

Expected behavior

Both emojis rendered the same way and visible in the resulting PDF document.

Actual behavior

actual

The inline emoji is rendered as an image element: <img class="emoji" draggable="false" alt="🙂" src="https://twemoji.maxcdn.com/2/svg/1f642.svg" data-marp-twemoji="">, while the block element emoji is rendered literally: 🙂

This is a problem when targeting PDF as the output format:

pdf

Additional information

No response

yhatt commented 2 years ago

https://markdown-it.github.io/#md3=%7B%22source%22%3A%22%3Cspan%3EInline%3A%20%26%23128578%3B%3C%2Fspan%3E%5Cn%5Cn%3Cdiv%3EBlock%3A%20%26%23128578%3B%3C%2Fdiv%3E%22%2C%22defaults%22%3A%7B%22html%22%3Afalse%2C%22xhtmlOut%22%3Afalse%2C%22breaks%22%3Afalse%2C%22langPrefix%22%3A%22language-%22%2C%22linkify%22%3Atrue%2C%22typographer%22%3Atrue%2C%22_highlight%22%3Atrue%2C%22_strict%22%3Atrue%2C%22_view%22%3A%22debug%22%7D%7D

markdown-it AST of the provided example will become as below:

[
  {
    "type": "paragraph_open",
    "tag": "p",
    "attrs": null,
    "map": [
      0,
      1
    ],
    "nesting": 1,
    "level": 0,
    "children": null,
    "content": "",
    "markup": "",
    "info": "",
    "meta": null,
    "block": true,
    "hidden": false
  },
  {
    "type": "inline",
    "tag": "",
    "attrs": null,
    "map": [
      0,
      1
    ],
    "nesting": 0,
    "level": 1,
    "children": [
      {
        "type": "html_inline",
        "tag": "",
        "attrs": null,
        "map": null,
        "nesting": 0,
        "level": 0,
        "children": null,
        "content": "<span>",
        "markup": "",
        "info": "",
        "meta": null,
        "block": false,
        "hidden": false
      },
      {
        "type": "text",
        "tag": "",
        "attrs": null,
        "map": null,
        "nesting": 0,
        "level": 0,
        "children": null,
        "content": "Inline: 🙂",
        "markup": "&#128578;",
        "info": "entity",
        "meta": null,
        "block": false,
        "hidden": false
      },
      {
        "type": "html_inline",
        "tag": "",
        "attrs": null,
        "map": null,
        "nesting": 0,
        "level": 0,
        "children": null,
        "content": "</span>",
        "markup": "",
        "info": "",
        "meta": null,
        "block": false,
        "hidden": false
      }
    ],
    "content": "<span>Inline: &#128578;</span>",
    "markup": "",
    "info": "",
    "meta": null,
    "block": true,
    "hidden": false
  },
  {
    "type": "paragraph_close",
    "tag": "p",
    "attrs": null,
    "map": null,
    "nesting": -1,
    "level": 0,
    "children": null,
    "content": "",
    "markup": "",
    "info": "",
    "meta": null,
    "block": true,
    "hidden": false
  },
  {
    "type": "html_block",
    "tag": "",
    "attrs": null,
    "map": [
      2,
      3
    ],
    "nesting": 0,
    "level": 0,
    "children": null,
    "content": "<div>Block: &#128578;</div>",
    "markup": "",
    "info": "",
    "meta": null,
    "block": true,
    "hidden": false
  }
]

Marp Core will transform an emoji within the content of inline markdown-it token into marp_unicode_emoji token, and render marp_unicode_emoji token as a twemoji SVG image.

https://github.com/marp-team/marp-core/blob/5c5eda0fb7ea9a202a3b0345202272bb0d9a457f/src/emoji/emoji.ts#L76-L109

On the other hand, the block element and its children are parsed as a single html_block token. Marp Core does not transform emojis within html_block token because may break raw HTML elements in some cases.

For emoji transformation in html_block token correctly, should implement a robust HTML parser and entity resolver, that are working in both Node.js and the browser. Unfortunately, we have not yet implemented them due to a lot of concerns: