kivikakk / comrak

CommonMark + GFM compatible Markdown parser and renderer
Other
1.17k stars 140 forks source link

Autolink edge cases #382

Closed digitalmoksha closed 2 months ago

digitalmoksha commented 5 months ago

Found a couple autolink edge cases:

digitalmoksha commented 5 months ago

Re: the second item

Rinku actually does balancing, like both cmark and comrak do for parentheses.

Looking at the cmark code, they don't consider a bracket as an ending delimiter - comrak does.

And it looks like I probably broke this when I added the relaxed-autolinks option - I added [ and ] to LINK_END_ASSORTMENT. https://github.com/kivikakk/comrak/pull/325/files

I can either

digitalmoksha commented 5 months ago

re: the first item

It looks like by the time we start trying to detect the autolink, the data has already been unencoded, meaning it's <<<http://example.com/>>> - they are no longer html entities. Not sure what, if anything, can be done about that.

My head officially hurts... 🤕

digitalmoksha commented 5 months ago

What lead me to this is that I'm trying to get rid of a custom auto_link filter that mimics what Rinku does. These are the two tests that are failing.

I may decide it's good enough to switch - I think these really are edge cases that I'm not sure how often we see in the wild.

kivikakk commented 5 months ago

Yes, indeed; Rinku is some preeeetty antique software by this stage (with no commit from the primary author since 2016, and none from the other maintainer (me!) since 2019), and I imagine the remaining users are pretty far and few between; certainly not at GitHub since the cmark-gfm switch happened, as its own autolink was used from then, which is what Comrak aims to emulate.

Ideally we continue to match cmark-gfm in regular mode — I don't mind what the behaviour is once relaxed-autolinks is specified. Let me know if you want a hand with the former.

digitalmoksha commented 5 months ago

Rinku is some preeeetty antique software by this stage

oh yes, very much 😄

Ideally we continue to match cmark-gfm in regular mode

totally agree. Created https://github.com/kivikakk/comrak/pull/386 to address this.

kivikakk commented 5 months ago

Alright! So we have the second item addressed by #386 — thanks very much — which leaves us with this unpleasantness:

$ echo 'See &lt;&lt;&lt;http://example.com/&gt;&gt;&gt;' | comrak -e autolink
<p>See &lt;&lt;&lt;<a href="http://example.com/%3E%3E%3E">http://example.com/&gt;&gt;&gt;</a></p>
$ echo 'See &lt;&lt;&lt;http://example.com/&gt;&gt;&gt;' | ~/g/archive/cmark-gfm/build/src/cmark-gfm -e autolink
<p>See &lt;&lt;&lt;<a href="http://example.com/">http://example.com/</a>&gt;&gt;&gt;</p>

I might have a look into this in the next couple of days!

kivikakk commented 2 months ago

I might have a look into this in the next couple of days!

Turned into a couple of months, but I got there!