yuin / goldmark

:trophy: A markdown parser written in Go. Easy to extend, standard(CommonMark) compliant, well structured.
MIT License
3.68k stars 255 forks source link

Some comments on the parser #131

Closed MichaelMure closed 4 years ago

MichaelMure commented 4 years ago

This is more a FYI than a real issue.

I'm writing a Markdown renderer specialized for the terminal and I'm in the process of migrating to goldmark. If you are curious, you can have a look at https://github.com/MichaelMure/go-term-markdown/pull/20 (not merged yet).

First thing first, thanks for your work :)

I just wanted to document for you some of the struggle I had. Feel free to dismiss that entirely, it's just ideas throw over the fence:

    Paragraph {
        RawText: "foo <ul><li>item1</li><li>item2</li></ul>"
        HasBlankPreviousLines: true
        Text: "foo
        RawHTML {
            RawText: <ul>
        }
        RawHTML {
            RawText: <li>
        }
        Text: "item1"
        RawHTML {
            RawText: </li>
        }
        RawHTML {
            RawText: <li>
        }
        Text: "item2"
        RawHTML {
            RawText: </li>
        }
        RawHTML {
            RawText: </ul>
        }
    }

Thank you!

MichaelMure commented 4 years ago

On the subject of links, it seems that there is no way in the AST to distinguish a complete link ([text](/url/)) from a reference ([ref]). Also a nice to have I think.

MichaelMure commented 4 years ago

Another thing you might find interesting. I needed to visualize the possible cases in the AST, how node types relate to each other, so I generated this diagram from my test cases. It's not perfect, some cases are missing (notably how much garbage can be added in a link's text) but it's helpful. Might be handy for your documentation.

image

jschaf commented 4 years ago

On the subject of links, it seems that there is no way in the AST to distinguish a complete link [text](/url/) from a reference ([ref]). Also a nice to have I think.

~Not the author but I'm currently digging through the source code to implement citations, e.g. [pg 3, @bibtex-key]. I think the trouble is that link references are implemented as a paragraph transformer. I'm guessing that's probably the case because goldmark can't know if there's a corresponding definition for a short reference. The commonmark demo shows that for:~

[foo]

[bar]

[bar]: example.com

[foo] is the literal text [foo] and [bar] is a LinkReference.

Edit

After a bit more digging, I think what's actually happening for links is:

  1. Goldmark parses all blocks with block parsers.
  2. Goldmark runs the link reference paragraph transformer. That transformer stores link reference lines, e.g. [foo]: http://example.com in the parser context.
  3. Goldmark runs the inline parsers, including the link parser. The link parsers handles short reference links, e.g. [foo], by checking the parser context. If a link exists the short reference link is promoted to a full link.

So, the request was instead of transforming a short reference link into a full link to differentiate with something like ShortRefLink?

yuin commented 4 years ago

You might already know, glamour is a markdown renderer for terminals that uses goldmark. glow is driven by glamour. I think it will be helpful for you.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.