miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
811 stars 113 forks source link

question: parsing HTML in Markdown files #126

Closed gvwilson closed 2 years ago

gvwilson commented 2 years ago

Is there a way to get mistletoe to parse HTML tags in Markdown documents? My test program is:

#!/usr/bin/env python

from mistletoe import Document
from mistletoe.ast_renderer import get_ast

text = """\
# Title

<div class="testing">

paragraph

</div>
"""

print(get_ast(Document(text)))

and its output is:

{'type': 'Document', 'footnotes': {}, 'children': [
  {'type': 'Heading', 'level': 1, 'children': [{'type': 'RawText', 'content': 'Title'}]},
  {'type': 'Paragraph', 'children': [{'type': 'RawText', 'content': '<div class="testing">'}]},
  {'type': 'Paragraph', 'children': [{'type': 'RawText', 'content': 'paragraph'}]},
  {'type': 'Paragraph', 'children': [{'type': 'RawText', 'content': '</div>'}]}
]}

Based on https://spec.commonmark.org/0.30/#example-152 from the CommonMark spec, I was expecting a {'type': 'Div', ...} node in the AST - is there a way to get that?