Open Anders-E opened 5 years ago
Well, in commonmark and pandoc markdown, the list item needs to be preceeded by an empty line. So I think that's not a bug.
You can use the --wrap=none
option to get your expected result.
It seems that's correct, thank you for pointing it out.
However, if you replace the 1.
in my input example with -
an non-numbered list will be output as CommonMark does not require a newline before regular lists.
Would this constitute a bug or should one use --wrap=none
to avoid these lists from popping up?
As of now, in pandoc markdown you need the newline even for bullet lists (this will change at some point in the future).
But indeed, this is even a bug in current commonmark output:
echo '<p>aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - aaaaaaaaaaaaaaaaaaaaaaa</p>' | pandoc -f html -t commonmark
I tried it with *
and +
as well. *
gets escaped correctly but +
leads to the same bug as -
.
Also thank you for the very quick replies!
As a workaround you can do --wrap=none
I just tried latest cmark and its commonmark renderer properly escapes these cases. This was unexpected, because pandoc uses libcmark (or rather the amplified version maintained by GitHub) to render commonmark! It should behave the same.
Probably upstream cmark has some changes that aren't yet in GitHub's cmark fork, or perhaps they are but the cmark-gfm package doesn't contain the latest?
I see this commit which is part of the 0.29 release of cmark:
commit 6122d5cc3c5e5e8f94f203daddfd38a36be7aed4
Author: John MacFarlane <jgm@berkeley.edu>
Date: Sat Apr 6 10:20:02 2019 -0700
commonmark renderer: improve escaping.
URL-escape special characters when escape mode is URL,
and not otherwise.
Entity-escape control characters (< 0x20) in non-literal
escape modes.
Looks like these changes are in cmark-gfm 0.2, though, so I'm still not understanding why pandoc isn't working... (EDIT: These changes don't seem relevant to list bullets.)
Hm.
*Text.Pandoc.CSV CMarkGFM> nodeToCommonmark [] (Just 72) $ commonmarkToNode [] [] "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n1\\. aaaaaa\n"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n1. aaaaaa\n"
So the escaping isn't done properly in the cmark-gfm Haskell library. Yet if I compile cmark-gfm C library and run the executable, it is done properly.
Interesting.
% pandoc -f commonmark -t commonmark --wrap=preserve
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1\. aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1\. aaaaaaaaaa
% pandoc -f commonmark -t commonmark --wrap=auto
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1\. aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1. aaaaaaaaaa
So, with --wrap=preserve
it works fine but with --wrap=auto
it fails to escape properly.
I can duplicate this using the cmark
executable from the C library:
% ./build/src/cmark-gfm -t commonmark --width 0
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1\. aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1\. aaaaaaaaaa
% ./build/src/cmark-gfm -t commonmark --width 72
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1\. aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1. aaaaaaaaaa
So this really is a problem in the cmark library, not pandoc itself.
I am not sure it's the same issue, but I've noticed that escaped characters disappear after line breaks :
% printf "foo \n\- bar" | pandoc -f commonmark_x -t commonmark_x
foo
- bar
% printf "foo \n1\. bar" | pandoc -f commonmark_x -t commonmark_x
foo
1. bar
which is annoying because it creates a list
Instead I would expect
% printf "foo \n\- bar" | pandoc -f commonmark_x -t commonmark_x
foo
\- bar
Escapes are not represented in the AST, so they will not round-trip.
Overview
Stumbled upon this while converting HTML to Markdown using pandoc. Basically when pandoc breaks up long lines of text using new lines, it might lead to a line starting with a number followed by a period.
This in turns means that the output contains a list element where the input does not.
Reproduction
Pandoc Version
Command Line Used
(tried it with all available Markdown formats and they all produce the same error)
Input used (doc.html)
Output received (doc.md)
(Notice the numbered list element)
Expected output
(or the same containing a new line which does not result in a numbered list)