Closed jgm closed 7 months ago
As noted in the linked discussion, this only affects parsing with CR+LF line endings.
The issue may be related to https://github.com/jgm/commonmark-hs/issues/136
Observations:
This bug appears with -f gfm
but NOT -f commonmark
. So it has to do with an extension. Need to isolate which extension with further testing.
The bug is in commonmark-hs, not pandoc itself.
% echo -e "[link](https://baidu.com)aaa<span></span>bbb\n" | commonmark -xgfm
<p><a href="https://baidu.com">link</a>aaa<span></span>bbb</p>
I can reproduce it even with LF line endings using commonmark-cli, so I'm not sure why things seem different with pandoc.
I will transfer this to commonmark-hs.
Using -xautolinks
instead of -xgfm
produces the issue. So it can be attributed to the autolinks extension.
The code for the autolinks extension is quite bad and needs work! There is an extensive set of tests here that we might attend to. And here is a syntax: https://unifiedjs.com/explore/package/micromark-extension-gfm-autolink-literal/#syntax
Some work in issue147 branch.
Thank you. :smiling_face_with_three_hearts:
Note: this may only affect platforms with CR+LF line endings.
Discussed in https://github.com/jgm/pandoc/discussions/9406
aaabbb
aaabbb
linkaaabbb
linkaaabbb
``` However, when the source language is [`gfm`](https://pandoc.org/try/?params=%7B%22text%22%3A%22aaabbb%5Cn%5Cnaaa%3Cspan%3E%3C%2Fspan%3Ebbb%5Cn%5Cn%5Blink%5D%28https%3A%2F%2Fbaidu.com%29aaabbb%5Cn%5Cn%5Blink%5D%28https%3A%2F%2Fbaidu.com%29aaa%3Cspan%3E%3C%2Fspan%3Ebbb%5Cn%22%2C%22to%22%3A%22html%22%2C%22from%22%3A%22gfm%22%2C%22standalone%22%3Afalse%2C%22embed-resources%22%3Afalse%2C%22table-of-contents%22%3Afalse%2C%22number-sections%22%3Afalse%2C%22citeproc%22%3Afalse%2C%22html-math-method%22%3A%22plain%22%2C%22wrap%22%3A%22auto%22%2C%22highlight-style%22%3Anull%2C%22files%22%3A%7B%7D%2C%22template%22%3Anull%7D), they are escaped: ```htmlaaabbb
aaabbb
linkaaabbb
linkaaa<span></span>bbb
``` I have read the specs and couldn't find any difference for links & raw HTML. Is this a bug in Pandoc?