miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
841 stars 119 forks source link

How to use extension tokens for parsing? #199

Closed nschloe closed 10 months ago

nschloe commented 1 year ago

I'd like to parse something like

Some math $a+b=c$

```math
x^2 = -1

into Markdown with explicit math blocks. I see that mistletoe has the Math extension token (https://mistletoe-ebp.readthedocs.io/en/latest/api/extension_tokens.html), but I can't for life of me figure out how to use it. The documentation (e.g., [here](https://mistletoe-ebp.readthedocs.io/en/latest/using/intro.html)) seems outdated since the `import`s in the examples don't even work anymore.

MWE:
```python
import mistletoe
from mistletoe.markdown_renderer import MarkdownRenderer

doc = mistletoe.Document("a ~~st~~ $b+c$")

print()
print(doc)
print()
print(doc.children)
print()
print(doc.children[0].children)

print()
with MarkdownRenderer() as mdr:
    print(repr(mdr.render(doc)))

<mistletoe.block_token.Document with 1 child at 0x7f51dd939a90>

[<mistletoe.block_token.Paragraph with 3 children at 0x7f51dda2a090>]

[
    <mistletoe.span_token.RawText content='a ' at 0x7f51dd72dd10>,
    <mistletoe.span_token.Strikethrough with 1 child at 0x7f51dd72ddd0>,
    <mistletoe.span_token.RawText content=' $b+c$' at 0x7f51dd72de50>
]

'a ~~st~~ $b+c$\n'

Any hints?

pbodnar commented 1 year ago

Hi @nschloe, I'm not sure if I get your question. In mistletoe (not mistletoe-ebp which is/was a fork which I don't know deeply), you can use e.g. MathJaxRenderer if you are interested in rendering HTML together with the MathJax JS library.

Related mistletoe documentation:

nschloe commented 1 year ago

not mistletoe-ebp which is/was a fork which I don't know deeply)

Ah, hadn't realized they were different. (When googling I always get to their documentation.)

if you are interested in rendering HTML

My interest in in parsing. I'd like to parse, change some things, and render back to Markdown. For this to work, I need Strikethrough (~~...~~), math, tables, etc. parsed correctly.

pbodnar commented 1 year ago

OK, so what about the following code? Essentially, you need to pass additional token class(es) to the parsing process as well as to define corresponding render_... method(s) - you do this by defining your own renderer class:

from typing import Iterable

import mistletoe
from mistletoe import block_token
from mistletoe.latex_token import Math
from mistletoe.markdown_renderer import Fragment, MarkdownRenderer

class MyMarkdownRenderer(MarkdownRenderer):
    def __init__(self, **kwargs):
        """
        Args:
            **kwargs: additional parameters to be passed to the ancestors'
                      constructors.
        """
        super().__init__(Math, **kwargs)

    def render_math(self, token) -> Iterable[Fragment]:
        yield Fragment(token.content + " (Math rules :))")

    # @override
    def render_fenced_code_block(
        self, token: block_token.BlockCode, max_line_length: int
    ) -> Iterable[str]:
        indentation = " " * token.indentation
        yield indentation + token.delimiter + token.info_string + (
            " (Math rules :))" if token.info_string == "math" else ""
        )
        yield from self.prefix_lines(token.content[:-1].split("\n"), indentation)
        yield indentation + token.delimiter

print(
    MyMarkdownRenderer().render(
        mistletoe.Document(
            """
a paragraph with math: $ 2^3 $

$$ c^2 = a^2 + b^2 $$

```math
x^2 = -1

""" ) ) )


This outputs:

a paragraph with math: $ 2^3 $ (Math rules :))

$$ c^2 = a^2 + b^2 $$ (Math rules :))

x^2 = -1
pbodnar commented 10 months ago

Closing this as answered, feel free to "reopen" by commenting.