mity / md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
MIT License
756 stars 138 forks source link

Parsing of link text with url under permissive-www-autolinks #152

Closed rundel closed 3 years ago

rundel commented 3 years ago

Parsing with permissive-www-autolinks seems to have unexpected behavior when using a standard markdown link with a url in the link text. See the example below with md2html.

tbBook [md2html]$ echo "www.google.com" | ./md2html --fpermissive-www-autolinks
<p><a href="http://www.google.com">www.google.com</a></p>

tbBook [md2html]$ echo "[www.google.com](http://www.google.com)" | ./md2html --fpermissive-www-autolinks
<p><a href="http://www.google.com"><a href="http://www.google.com">www.google.com</a></a></p>

tbBook [md2html]$ echo "[www.google.com](http://www.google.com)" | ./md2html
<p><a href="http://www.google.com">www.google.com</a></p>

Based on the CommonMark Spec:

Backtick code spans, autolinks, and raw HTML tags bind more tightly than the brackets in link text. Thus, for example, [foo] could not be a link text, since the second ] is part of a code span.

I would expect potentially something like

<p>[<a href="http://www.google.com">www.google.com</a>](http://www.google.com)</p>

but not the nested links that are currently produced, which would seem to violate the CommonMark Spec

Links may not contain other links, at any level of nesting. If multiple otherwise valid link definitions appear nested inside each other, the inner-most definition is used.

mity commented 3 years ago

On one hand, I agree the behavior is not optimal.

On the other hand, it makes a sense from implementation point of view that the permissive autolinks behave exactly as the explicit standard autolink as supported by the CommonMark.

Unfortunately, the specification does not prohibit them to be nested inside inline links (it only prohibits nested inline links). This was reported some time ago.

It would be surprisingly a lot of work to make the standard and permissive autolinks work differently, so I would wait until the next spec version is out: If it changes the behavior of standard autolinks nested inside inline links, it would be far less work to fix it both for the standard and permissive autolinks.

(If the next specification does not address the problem, I will likely fix it for the permissive ones anyway as I understand the problem for them is more likely to be encountered and can affact users more in the real world. So lets keep this open as a reminder.)

mity commented 3 years ago

After some more thought, we can treat the case when the permissive autolink forms all the outer inline link label specially: In that case generating the extra link is simply just a nonsense. This should fix most troubles with this issue in the real life. The commit linked above does exactly that.

If, on the other hand, the link contains anything more (even a whitespace), we allow the nesting as with the standard autolinks. This won't be changed unless the specification asks for that, so closing.