heading token content is always the last heading

miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.

MIT License

841 stars 119 forks source link

import mistletoe from mistletoe import Document markdown_text = """ ## Heading A Some text under heading 1. ## Heading B *italic* Some text under heading 2. ## Heading C **bold** Some text under heading 3. """ # Parse the Markdown text doc = Document(markdown_text) for token in doc.children: if isinstance(token, mistletoe.block_token.Heading): # heading_content = " ".join(str(child.content) for child in token.children if hasattr(child, 'content')) heading_content = token.content print(f'Level: {token.level}, Content: {heading_content}')

Hi @clehene, thanks for the report. For elements like BlockCode or HtmlBlock, the content property/attribute makes sense (see #163). For others, like Heading, not so much.

As you are not the first to ask this question (see #99), I think the simplest thing to do now is to rename the reported Heading's content class attribute to _content, so that it is clear that it is not a part of the public API. That would be in-line e.g. with _open_info in the CodeFence token class.

I can also imagine we could introduce a getter property called text or similar that would recursively concat text from the child tokens, but that would be for another feature request...

miyuchina / mistletoe

heading token content is always the last heading #212