jgm / djot

A light markup language
https://djot.net
MIT License
1.66k stars 43 forks source link

Wrong parse, link vs emph #88

Closed matklad closed 1 year ago

matklad commented 1 year ago
[find_symbols.rs](https://github.com/matklad/fall/blob/master/lang/rust/src/editor/file_symbols.rs)

This is currently parsed as emph:

doc
  para
    str s="[find"
    emph
      str s="symbols.rs](https://github.com/matklad/fall/blob/master/lang/rust/src/editor/file"
    str s="symbols.rs)"

This should be parsed as a link

jgm commented 1 year ago

Thanks. Minimal case:

[x_y](x_y)
jgm commented 1 year ago

Also a close relative of

[Link *](url)*

which is an example in our syntax description:

[Link *](url*)
jgm commented 1 year ago

The docs say

The basic principle governing “precedence” for inline containers is that the first opener that gets closed takes precedence. Containers can’t overlap, so once an opener gets closed, any potential openers between the opener and the closer get marked as regular text and can no longer open inline syntax.

I think the issue here is that the parser doesn't know we have a link, and hence doesn't count the first [ as closed, until we get to the final ). So, on that basis one could argue that the emphasis is the first container to be closed. At least, I think that's how the parser is thinking of it. I agree that another interpretation would be more sensible.

jgm commented 1 year ago

Note that this is avoided in commonmark (reference parsers) for two reasons:

1 - commonmark resolves all link containers before looking at any emphasis containers 2 - commonmark uses a scanner for the (..) part of a link that has potentially infinite lookahead, and that backtracks if a match isn't found

I'd like to avoid 2, and 1 is incompatible with the basic principle stated above. So, not sure exactly how to handle this case.

jgm commented 1 year ago

I suppose one approach could be to clear the intervening openers as soon as we match a pair [..], regardless of whether this will eventually turn into a link.

pkulchenko commented 1 year ago

Could you track both at the same time (potential link and em/strong formatting) until ]( is found and then enforce the link? This looks to be similar to the case of _some*text*more_ being valid, but _some*text_more* not. To me finding ]( after [ should reset "stack of openers" (using your terminology) that started between [ and ]( (even though ) may be missing).

jgm commented 1 year ago

Yes, I think we could consider the opener matched when ]( or ][ is found, even if these aren't followed by what would be required to form a link.

This looks to be similar to the case of _some*text*more_ being valid, but _some*text_more* not

They're both "valid," in the sense of producing a parse; in the second case the _ is the first to close, so we get <em>some*text</em>more*.

pkulchenko commented 1 year ago

They're both "valid," in the sense of producing a parse; in the second case the _ is the first to close, so we get sometextmore.

Right; I agree with the result and was just suggesting that the same logic should/can be applied to the links (at ](/][ boundary). Thanks!

jgm commented 1 year ago

See above on why "first to close" is a bit ambiguous in this case.