miyuchina / mistletoe

A fast, extensible and spec-compliant Markdown parser in pure Python.
MIT License
791 stars 110 forks source link

parse-render loop creates newline #198

Open nschloe opened 8 months ago

nschloe commented 8 months ago

MWE:

import mistletoe
from mistletoe.markdown_renderer import MarkdownRenderer

print(repr(MarkdownRenderer().render(mistletoe.Document("a"))))

Output:

'a\n'

Expected output:

'a'
pbodnar commented 8 months ago

Hi @nschloe, the behavior you describe seems to be intentional, or say "by-design" to me. It is the Document class constructor which adds the new newline (\n) if it is missing in the input:

class Document(BlockToken):
    """
    Document token.
    This is a container block token. Its children are block tokens - container or leaf ones.

    Attributes:
        footnotes (dictionary): link reference definitions.
    """

    def __init__(self, lines):
        if isinstance(lines, str):
            lines = lines.splitlines(keepends=True)
        lines = [line if line.endswith('\n') else '{}\n'.format(line) for line in lines]
        # ...

I'm not sure if this should be / can be safely changed. So closing for now, but feel free to reopen this issue if needed.

nschloe commented 8 months ago

Not sure why you put in this workaround. The only situation I could imagine where it does anything is to append a missing terminal "\n" and to add "\n" where "\r" is a line break. Don't know why you'd want either of those.

I'd always expect a parse-render loop to stay as faithful to the input as possible, unless it's illegal input; error out then. What's your take on this?

pbodnar commented 8 months ago

Yeah, I'm not sure either why this was introduced by the original author 5 years ago. As you write, this could possibly deserve some investigation, so I'm reopening this...