miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
791 stars 110 forks source link

Telegram's MarkdownV2 #214

Closed yelboudouri closed 1 month ago

yelboudouri commented 3 months ago

This is not an issue; it's more like a feature request. Telegram has MarkdownV2, which is a little different from standard markdown, and there isn't really a solution that offers correct conversion between the two markdown formats. I started writing a renderer but ran into some issues. Could you please help me implement this renderer?

According to the documentation of Telegram, this is the MarkdownV2 syntax: MarkdownV2 Style.

pbodnar commented 3 months ago

Hello @yelboudouri, I don't have much time for mistletoe now, but I might try to help now and then. What concrete issues did you run into?

I expect that you base the converter on the MarkdownRenderer and you basically override the individual rendering methods, so that they handle the "peculiarities" of the Telegram's MarkdownV2 syntax, right?

BTW While googling for this thing, I have found this 1-year-old project called md2tgmd. Maybe it could be used for your goal? I can see it doesn't support strikethrough and italic yet though (yym68686/md2tgmd/issues/2)...

yelboudouri commented 2 months ago

Thank you for getting back to me.

I gave md2tgmd a try, but it didn't quite meet my requirements. Therefore, I decided to develop my own converter. However, I encountered difficulty in creating a markdown parser given all the edge cases I need to take into account. That's when I came across mistloe, which already provides a solid foundation to build upon.

As you mentioned, I'm currently overriding individual methods of MarkdownRenderer. I'm facing some challenges, particularly with unordered lists. Since Telegram Markdown doesn't support rendering lists, I aim to substitute (*, -, or +) with an ASCII bullet point "•". Additionally, I'm struggling to modify the delimiter for italicized text within the following function:

def render_emphasis(self, token: span_token.Emphasis) -> Iterable[Fragment]:
    return self.embed_span(Fragment(token.delimiter), token.children)

Thank you for your help!

sudoskys commented 2 months ago

https://github.com/sudoskys/telegramify-markdown/blob/main/src/telegramify_markdown/render.py @pbodnar @yelboudouri That's a remarkable coincidence! I have implemented it based on the internal classes of mistletoe. I believe this is what you are looking for. It inherits the MarkdownRenderer class to implement custom rules.

pbodnar commented 2 months ago

@sudoskys, really, what a coincidence! :) Thanks for sharing your work, I think I like the idea of having it as a separate project. @yelboudouri, can we consider this as "resolved" then? If it shows all good, I could possibly just add a link to the telegramify-markdown project to the README of mistletoe...

yelboudouri commented 1 month ago

@sudoskys does it support multi-line code??

sudoskys commented 1 month ago

@yelboudouri Of course it is supported.

https://github.com/sudoskys/telegramify-markdown/blob/main/tests/exp1.md

yelboudouri commented 1 month ago

Thank you, @sudoskys, for the help. After some consideration, I decided to go ahead with your library. It can still be improved, so I'll be opening an issue so we can continue the discussion there. Thank you, @pbodnar. You can close the issue.