kivikakk / comrak

CommonMark + GFM compatible Markdown parser and renderer
Other
1.18k stars 140 forks source link

autolink plus superscript interact weirdly #58

Closed brson closed 3 months ago

brson commented 6 years ago

^ is valid in URLS, but as a postprocessing pass, strings that should probably get autolinked end up superscripted, mangling the link.

https://www.wolframalpha.com/input/?i=x^2+(y-(x^2)^(1/3))^2=1

run through comrak with -e autolink -e superscript comes out as

<p><a href="https://www.wolframalpha.com/input/?i=x">https://www.wolframalpha.com/input/?i=x</a><sup>2+(y-(x</sup>2)^(1/3))^2=1</p>

The link gets chopped off to become superscript text.

There may be other syntax that interacts similarly-wierdly with the autolinker. I don't have any ideas for how to fix this beyond integrating the autolinker into the inline parser.

FWIW snudown miraculously autolinks that URL as one might expect.

kivikakk commented 5 years ago

This is awkward, yeah :( I'm a bit of a fan of the autolinker as it stands since its implementation is so straightforward, being separated out entirely from the reasonably ugly inline parser. (of course, this also makes it easiest to conform with the GFM autolinking examples, since it works the same way there. Trying to reproduce the same results via an algorithm that was integrated with the inline parser might be difficult.)

I don't think it's the worst to ask users to e.g. use <https://www.wolframalpha.com/input/?i=x^2+(y-(x^2)^(1/3))^2=1> in this case, though the result it does give as-is is quite ugly.

brson commented 5 years ago

It's a bit hard to recall exactly what I was thinking here, but I suspect the wording of the GFM spec suggests the OP's example should parse as a link (and old-Reddit of course does).

I agree that architecturally having the autolinker separate from parsing is much desirable to intertwining the logic with parsing, changing both comrak and upstream cmark to handle these cases would be quite hard, that it's probably best to just leave as-is, maybe word the spec more precisely and add more test cases like in the OP if it even comes down to it.

Edit: oh, GFM spec doesn't even define superscript, so it's hard to say anything about the spec, though it's reasonable to assume that extensions are additive and don't break other features. I'm not sure offhand if any other characters break URLs like ^.

kivikakk commented 3 months ago

We got there. :)