miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
841 stars 119 forks source link

HTML Blocks #231

Open cppgent0 opened 1 week ago

cppgent0 commented 1 week ago

I have a markdown file with html in it e.g.

<table class="info_table">
<tr class="ja_info_tr">
<td class="ja_info_td"><strong>PyPi module</strong></td>
<td class="ja_info_td">N/A</td>
</tr>
<tr class="ja_info_tr">
<td class="ja_info_td"><strong>Version Info</strong></td>
<td class="ja_info_td"><ul><li>macOS 14.5, Python 3.10</li><li>Ubuntu 20.04 focal, Python 3.10</li>
</ul></td>
</tr>
</table>

This, according to commonmark is allowed see https://spec.commonmark.org/0.31.2/#html-blocks and I've checked it out https://spec.commonmark.org/dingus/ and it works fine there.

Also see https://spec.commonmark.org/0.31.2/spec.json which has their test spec (in JSON) for this scenario:

  {
    "markdown": "<table>\n  <tr>\n    <td>\n           hi\n    </td>\n  </tr>\n</table>\n\nokay.\n",
    "html": "<table>\n  <tr>\n    <td>\n           hi\n    </td>\n  </tr>\n</table>\n<p>okay.</p>\n",
    "example": 149,
    "start_line": 2457,
    "end_line": 2476,
    "section": "HTML blocks"
  },

I've taken a copy of html_renderer.py and modifying it for my markdown files. So far so good, except for this html block. It translates into

<p>&lt;table class="ja_info_table"&gt;
&lt;tr class="ja_info_tr"&gt;
etc.

I've dumped the AST and there are no HTML block elements, they are all RawText.

In html_rendered.py script there is a function:

    @staticmethod
    def render_html_block(token: block_token.HtmlBlock) -> str:

But it's never called (checked it with a print()). I checked out base_renderer and it doesn't seem to be called there either.

I can't tell if the problem is with my code or if the renderer is just not calling the render_html_block or the AST parser is failing to detect the raw HTML in the markdown file at all.