miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
818 stars 118 forks source link

How to add a custom token inside a block token? #194

Open MilyMilo opened 1 year ago

MilyMilo commented 1 year ago

Hi! Sorry if this is a silly question - I've just started using mistletoe today! (I really like the interface, thank you for your work!)

I'd like to implement an [embed]: (file.txt) token, which will read the file and include it inline.

I have code like this:

class Embed(SpanToken):
    parse_inner = False
    pattern = re.compile(r"\[embed\]\s*:?\s*#?\s*\((.*)\)")

    def __init__(self, match):
        self.include_path = match.group(1)

class WriteupRenderer(HtmlRenderer):
    def __init__(self, **kwargs):
        super().__init__(Embed, **kwargs)

    def render_embed(self, token: Embed):
        # will read and output the file contents
        return "RENDERING EMBED"

The only issue is that most of my embeds live inside code blocks or code fences like:

`code with [embed]: (./shell64.s)`

Which results in markup that's expected to me now, but I'd like to change this behaviour:

<p><code>code with [embed]: (./shell64.s)</code></p>

(My embed tokens work fine outside of code blocks and fences)

From what I could see, the CodeBlock does not parse inner tokens, how to go about implementing this? Is it possible without re-creating too many tokens / blocks?

anderskaplan commented 1 year ago

Hi @MilyMilo, it is certainly possible to make custom tokens for fenced code blocks and for code spans with the behavior that you want, but I believe it's going to be a lot of work.

The embedding feature you want would probably be easier to build as a pre-processing step, working on the raw text input. That would also make sense from a conceptual point of view, imho.

Maybe you could use a cusomized parser which only recognizes paragraphs and links in that first step 😄

pbodnar commented 1 year ago

Hi @MilyMilo, as @anderskaplan writes, it would probably make sense to do the embedding as a pre-processing step - i.e. before calling mistletoe?

Or, I think you could alternatively try just to override the corresponding HtmlRenderer's render methods like this (schematically):

    def render_inline_code(self, token: span_token.InlineCode) -> str:
        # TODO: embed chunks inside token.children[0].content
        return super().render_inline_code(token)

    # ...

    def render_block_code(self, token: block_token.BlockCode) -> str:
        # TODO: embed chunks inside token.children[0].content (the same as above => call a shared helper method)
        return super().render_block_code(token)

What do you think?

MilyMilo commented 1 year ago

Thank you @pbodnar and @anderskaplan for pointers!

As both of you mentioned - this is currently implemented in preprocessing. We just have a regex to do that, however doesn't feel very clean and robust. That's why I wanted to use mistletoe to improve this.

I'll check if there's a nice way to get it working and post the code here if I end-up implementing this.