Open bzaczynski opened 2 years ago
markdown-it AST of the provided example will become as below:
[
{
"type": "paragraph_open",
"tag": "p",
"attrs": null,
"map": [
0,
1
],
"nesting": 1,
"level": 0,
"children": null,
"content": "",
"markup": "",
"info": "",
"meta": null,
"block": true,
"hidden": false
},
{
"type": "inline",
"tag": "",
"attrs": null,
"map": [
0,
1
],
"nesting": 0,
"level": 1,
"children": [
{
"type": "html_inline",
"tag": "",
"attrs": null,
"map": null,
"nesting": 0,
"level": 0,
"children": null,
"content": "<span>",
"markup": "",
"info": "",
"meta": null,
"block": false,
"hidden": false
},
{
"type": "text",
"tag": "",
"attrs": null,
"map": null,
"nesting": 0,
"level": 0,
"children": null,
"content": "Inline: 🙂",
"markup": "🙂",
"info": "entity",
"meta": null,
"block": false,
"hidden": false
},
{
"type": "html_inline",
"tag": "",
"attrs": null,
"map": null,
"nesting": 0,
"level": 0,
"children": null,
"content": "</span>",
"markup": "",
"info": "",
"meta": null,
"block": false,
"hidden": false
}
],
"content": "<span>Inline: 🙂</span>",
"markup": "",
"info": "",
"meta": null,
"block": true,
"hidden": false
},
{
"type": "paragraph_close",
"tag": "p",
"attrs": null,
"map": null,
"nesting": -1,
"level": 0,
"children": null,
"content": "",
"markup": "",
"info": "",
"meta": null,
"block": true,
"hidden": false
},
{
"type": "html_block",
"tag": "",
"attrs": null,
"map": [
2,
3
],
"nesting": 0,
"level": 0,
"children": null,
"content": "<div>Block: 🙂</div>",
"markup": "",
"info": "",
"meta": null,
"block": true,
"hidden": false
}
]
Marp Core will transform an emoji within the content of inline
markdown-it token into marp_unicode_emoji
token, and render marp_unicode_emoji
token as a twemoji SVG image.
On the other hand, the block element and its children are parsed as a single html_block
token. Marp Core does not transform emojis within html_block
token because may break raw HTML elements in some cases.
For emoji transformation in html_block
token correctly, should implement a robust HTML parser and entity resolver, that are working in both Node.js and the browser. Unfortunately, we have not yet implemented them due to a lot of concerns:
html_block
token may have only a part of the completed HTML block. So well-known HTML compliant parsers, such as browser's DOMParser, htmlparser2, and parse5 cannot use in our use case.
<div class="😄">
# Markdown content 👍
</div>
In above case, html_block
token will be split into <div class="😄">
and </div>
. When tried to parse and tranform these fragments with a known parser, the opening element will be unnecessarily closed due to HTML compliant behavior of auto-closing tags, and parsing the closing element will fail as invalid HTML.
If applied a simple string replacement, the raw HTML block may break in some edge cases.
<script>document.title = "🙂";</script>
➡️ <script>document.title = "<img class="emoji" draggable="false" alt="🙂" src="https://twemoji.maxcdn.com/2/svg/1f642.svg" data-marp-twemoji="">";</script>
Version of Marp Tool
v1.0.0
Operating System
Linux
Environment
Running in a Docker container. (The latest version 2.0.4 seems to suffer from the same issue.)
How to reproduce
Create the following Markdown file named slide-deck.md:
Run the following CLI command with Docker to generate HTML:
Run the following CLI command with Docker to generate PDF:
Expected behavior
Both emojis rendered the same way and visible in the resulting PDF document.
Actual behavior
The inline emoji is rendered as an image element:
<img class="emoji" draggable="false" alt="🙂" src="https://twemoji.maxcdn.com/2/svg/1f642.svg" data-marp-twemoji="">
, while the block element emoji is rendered literally: 🙂This is a problem when targeting PDF as the output format:
Additional information
No response