miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
841 stars 119 forks source link

Simply thank you! #217

Closed Lucas-C closed 1 month ago

Lucas-C commented 5 months ago

Hi!

I just want to report a happy user šŸ™‚

I have switched from using markdown-it (NodeJS) to mistletoe, and I really love it.

This morning, in 30min, I was able to quickly add support for ::: block containers, in a similar fashion as markdown-it-container:

from mistletoe import HtmlRenderer
from mistletoe.block_token import tokenize, BlockToken

class CustomHtmlRenderer(HtmlRenderer):
    def __init__(self):
        super().__init__(TripleCommaDiv)

    def render_triple_comma_div(self, token):
        inner = self.render_inner(token)
        return f'<div class="{token.classes}">{inner}</div>'

class TripleCommaDiv(BlockToken):
    """
    Simple <div> block. (["::: class1 class2", ..., ":::"])
    Block start is indicated by a line starting with at least three ":" characters.
    Same for the block end.
    The exact number of ":" characters does not matter at all.

    This aims to be compliant / cover the same functionality as markdown-it-container:
    https://www.npmjs.com/package/markdown-it-container

    Attributes:
        classes (str): CSS class names inserted in the "class" HTML attribute
    """

    @staticmethod
    def start(line):
        return line.startswith(":::")

    @classmethod
    def read(cls, lines):
        first_line = next(lines)
        classes = first_line.lstrip(":").strip()
        delimiter = cls._delimiter_from_line(first_line)
        child_lines = []
        for line in lines:
            if line.startswith(delimiter):
                if line[len(delimiter)] != ":":
                    # End block found:
                    break
                else:
                    print(f"WARN: Unexpected longer delimiter: '{line.rstrip()}' - Expected block end delimiter: {delimiter}")
            child_lines.append(line)
        children = tokenize(child_lines)
        return classes, children

    @staticmethod
    def _delimiter_from_line(line):
        level = 0
        while line[3 + level] == ":":
            level += 1
        return ":" * (3 + level)

    def __init__(self, match):
        self.classes, self.children = match

Example Markdown content that can be processed by this parser:

:::: class1

## Heading

Some parapgraph text.

::: class2 class3

## Sub-heading

Some other content.

:::

::::

The API of your library is really well crafted, and its excellent design makes it very easy to extend!

So I just wanted to say THANK YOU šŸ‘

Lucas-C commented 5 months ago

Also, I have a question: how do you recommand to handle malformed Markdown? Should a BlockToken subclass raise an error in the read() method? Or would it better to try to produce a valid AST and just produce a warning, or a log line, describing the problem ?

pbodnar commented 5 months ago

Hi @Lucas-C, thanks for your message, I guess it would make @miyuchina happy, provided he still watches this project. :)

To your question, I think the common approach so far is to skip the "malformed markdown" and treat it as it would be a plain text instead - as it actually could be just a text. At least that is what I can see in the various examples in the CommonMark spec. But I would also say it also depends on the exact case we want to handle. So your suggestions could possibly be also valid, but I'm not just sure at the moment...

tim-forrer commented 5 months ago

@Lucas-C thank you also for providing this example for a custom block token - I found it really helpful!

Perhaps it could be a good idea to add this example to the developer guide, so that there is one example for a span token and one for a block token?

TheCodeForge commented 1 month ago

Perhaps it could be a good idea to add this example to the developer guide, so that there is one example for a span token and one for a block token?

229 opened

pbodnar commented 1 month ago

OK, so I think we can safely close this one now. :)