mity / md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
MIT License
776 stars 146 forks source link

Angle brackets in link destinations #24

Closed mity closed 7 years ago

mity commented 7 years ago

The spec recognizes two types of link destinations:

  1. a sequence of zero or more characters between an opening < and a closing > that contains no spaces, line breaks, or unescaped < or > characters, or

    1. a nonempty sequence of characters that does not include ASCII space or control characters, and includes parentheses only if (a) they are backslash-escaped or (b) they are part of a balanced pair of unescaped parentheses. (Implementations may impose limits on parentheses nesting to avoid performance issues, but at least three levels of nesting should be supported.)

Although not explicitly stated, it's clear from various discussions (e.g. https://github.com/commonmark/cmark/issues/193, https://github.com/commonmark/cmark/pull/219) that if parsing with the type 1 fails, the parser should retry with type 2.

However, MD4C currently does that only on the link destination level, not whole link. (See function md_is_link_destination()).

Hence we parse correctly

[a](<te<st>)

But we fail with

[a](<x>X)

because <x> is seen as type 1, but the following unexpected char X then makes it to not be seen as a link.