kivikakk / comrak

CommonMark + GFM compatible Markdown parser and renderer
Other
1.12k stars 133 forks source link

Autolink incorrectly dropping tildes #423

Open Manishearth opened 1 week ago

Manishearth commented 1 week ago

GitHub recognizes colons and tildes in URLs which is used by the "link to text section" (#:~:text) url fragment specifier

https://www.unicode.org/review/pri453/feedback.html#:~:text=Fri%20Jun%2024%2009:56:01%20CDT%202022

I think the fix is just removing ~ and : from LINK_END_ASSORTMENT?

Manishearth commented 1 week ago

I think strikethrough parsing is interfering with autolinking

kivikakk commented 1 week ago

Can you give me a full repro? Autolink and strikethrough both enabled, I don't have any trouble with the example given:

$ comrak --version
comrak 0.24.1
$ cat colon-tilde.md
Here's an autolink: https://www.unicode.org/review/pri453/feedback.html#:~:text=Fri%20Jun%2024%2009:56:01%20CDT%202022.
$ comrak -e autolink -e strikethrough colon-tilde.md
<p>Here's an autolink: <a href="https://www.unicode.org/review/pri453/feedback.html#:~:text=Fri%20Jun%2024%2009:56:01%20CDT%202022">https://www.unicode.org/review/pri453/feedback.html#:~:text=Fri%20Jun%2024%2009:56:01%20CDT%202022</a>.</p>
Manishearth commented 1 week ago

Ah I think you need two such URLs in the same line

Here's an autolink: https://www.unicode.org/review/pri453/feedback.html#:~:text=Fri%20Jun%2024%2009:56:01%20CDT%202022 and another one https://www.unicode.org/review/pri453/feedback.html#:~:text=Fri%20Jun%2024%2009:56:01%20CDT%202022.

with comrak -e autolink -e strikethrough

kivikakk commented 1 week ago

Thanks! Confirmed. This will take some careful comparison of the behaviour of Comrak and cmark-gfm (where strikethrough is done via the extensions API). I wonder if its autolink is managing to trigger first and therefore skip strikethrough entirely — Comrak does autolink in a postprocessing step, which is why there are so many issues (#58 #382 #388) with it.