miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
841 stars 119 forks source link

HTMLBlock isn't parsed. #106

Closed Titanexx closed 3 years ago

Titanexx commented 3 years ago

HTMLBlock isn't parsed unless you add it manually (Test comes from https://spec.commonmark.org/0.28/#example-139)

Test script :

from mistletoe import Document, HTMLRenderer,block_token

print_type = lambda s,cs: print(s,[c.__class__.__name__ for c in cs])

md = """# Test

<style
  type="text/css">
h1 {color:red;}

p {color:blue;}
</style>
okay"""

doc = Document(md)
print_type("doc.children:",doc.children)
print_type("doc.children[0]:",doc.children[0].children)
print_type("doc.children[1]:",doc.children[1].children)
print_type("doc.children[2]:",doc.children[2].children)

with HTMLRenderer() as renderer:
    print(renderer.render(doc))

print('==== ADD HTMLBlock ====\n')

block_token.add_token(block_token.HTMLBlock)

doc = Document(md)
print_type("doc.children:",doc.children)
print_type("doc.children[0]:",doc.children[0].children)
print("doc.children[1]:",repr(doc.children[1].content))
print_type("doc.children[2]:",doc.children[2].children)

with HTMLRenderer() as renderer:
    print(renderer.render(doc))

Which produces :

$ python .\main.py
doc.children: ['Heading', 'Paragraph', 'Paragraph']
doc.children[0]: ['RawText']
doc.children[1]: ['RawText', 'LineBreak', 'RawText', 'LineBreak', 'RawText']
doc.children[2]: ['RawText', 'LineBreak', 'RawText', 'LineBreak', 'RawText']
<h1>Test</h1>
<p>&lt;style
type=&quot;text/css&quot;&gt;
h1 {color:red;}</p>
<p>p {color:blue;}
&lt;/style&gt;
okay</p>

==== ADD HTMLBlock ====

doc.children: ['Heading', 'HTMLBlock', 'Paragraph']
doc.children[0]: ['RawText']
doc.children[1]: '<style\n  type="text/css">\nh1 {color:red;}\n\np {color:blue;}\n</style>'
doc.children[2]: ['RawText']
<h1>Test</h1>
<style
  type="text/css">
h1 {color:red;}

p {color:blue;}
</style>
<p>okay</p>

Remediation:

Change the block_token.all var like :

"""
Tokens to be included in the parsing process, in the order specified.
"""
__all__ = ['BlockCode', 'Heading', 'Quote', 'CodeFence', 'ThematicBreak',
           'List', 'Table', 'Footnote','HTMLBlock','Paragraph']
pbodnar commented 3 years ago

I'm closing this as a duplicate of #56: the solution is to create Document() inside a with HTMLRenderer() as renderer block. Or call mistletoe.markdown(fin, HTMLRenderer) which effectively does the same for you.

But thanks for reporting this, it at least shows that the documentation needs to be updated (before ever changing the API as suggested at the same issue).

pbodnar commented 3 years ago

Done: f5ea6d66f7084921fd403030f7faebe312935c8d. :)

Titanexx commented 3 years ago

Thanks for support :)