jgm / commonmark-hs

Pure Haskell commonmark parsing library, designed to be flexible and extensible
137 stars 31 forks source link

gfm parsing oddity with links in link text #156

Open TripleCamera opened 1 month ago

TripleCamera commented 1 month ago

Explain the problem.

In gfm, links in link text should not be parsed.

Input:

[https://bilibili.com/](https://bilibili.com/)

Actual output:

<p>[<a
href="https://bilibili.com/](https://bilibili.com/)">https://bilibili.com/](https://bilibili.com/)</a></p>

Expected output:

<p><a href="https://bilibili.com/">https://bilibili.com/</a></p>

Try pandoc!

The bug is in the autolink_bare_uris extension:

C:\Users\EricQiu>pandoc -f gfm-autolink_bare_uris
[https://bilibili.com/](https://bilibili.com/)
^Z
<p><a href="https://bilibili.com/">https://bilibili.com/</a></p>

C:\Users\EricQiu>pandoc -f gfm
[https://bilibili.com/](https://bilibili.com/)
^Z
<p>[<a
href="https://bilibili.com/](https://bilibili.com/)">https://bilibili.com/](https://bilibili.com/)</a></p>

Pandoc version?

pandoc 3.3 (the latest version)

jgm commented 1 month ago

Test of gfm behavior:

[[hello](url)](there)

[hello](url)

Here the inner link takes precedence. So, with autolink_bare_uris, it's just the same. The inner link (now an automatically created one) takes precedence. It's a bit hard to see how to avoid this with our current modular structure, where the core (which handles regular links) doesn't know about the autolink_bare_uris extension. I suppose we could do something ad hoc, which is probably what GFM does...