Closed lostenderman closed 1 year ago
The corresponding unit test is testfiles/CommonMark_0.30/entity_and_numeric_character_references/004.test
:
% ---RESULT--- "example": 28,
%
% <p><em>&nbsp &x; &#; &#x;</em>
% <em>&#87654321;</em>
% <em>&#abcdef0;</em>
% <em>&ThisIsNotDefined; &hi?;</em></p>
%
% ---\RESULT---
<<<
*  &x; &#; &#x;*
*�*
*&#abcdef0;*
*&ThisIsNotDefined; &hi?;*
>>>
documentBegin
emphasis: (ampersand)nbsp (ampersand)x; (ampersand)(hash); (ampersand)(hash)x;
emphasis: (ampersand)(hash)87654321;
emphasis: (ampersand)(hash)abcdef0;
emphasis: (ampersand)ThisIsNotDefined; (ampersand)hi?;
documentEnd
Here is the result of running git checkout commonmark; cd tests; ./test.sh "testfiles/CommonMark_0.30/entity_and_numeric_character_references/004.test"
:
Testfile testfiles/CommonMark_0.30/entity_and_numeric_character_references/004.test
Format templates/plain/
Template templates/plain/input.tex.m4
Command luatex --interaction=nonstopmode test.tex
Template templates/plain/verbatim.tex.m4
Command luatex --interaction=nonstopmode test.tex
I cannot reproduce the error that you report.
The markdown input
*  &x; &#; &#x;*
*�*
*&#abcdef0;*
*&ThisIsNotDefined; &hi?;*
a
fails with
...
{path to markdown.lua}:2359: bad argument #1 to 'char' (invalid value)
stack traceback:
[C]: in field 'char'
{path to markdown.lua}:2359: in function </{path to markdown.lua}:2358>
[C]: in function 'lpeg.match'
{path to markdown.lua}:3239: in field 'parse_blocks'
{path to markdown.lua}:3925: in local 'transform'
{path to markdown.lua}:184: in field 'cache'
{path to markdown.lua}:3929: in local 'convert'
[\directlua]:1: in main chunk.
\lua_now:e #1->\__lua_now:n {#1}
l.23 \end{markdown}
@lostenderman I can reproduce that:
$ git clone --single-branch --branch main https://github.com/witiko/markdown.git
$ cd markdown/
$ git remote add lostenderman https://github.com/lostenderman/markdown.git
$ git fetch lostenderman
$ git merge lostenderman/commonmark
$ make TEXLIVE_TAG=latest docker-image
$ rm -rf tests/templates/{latex,context}
$ docker run --rm -it -v "$PWD"/tests:/mnt -w /mnt witiko/markdown:latest
# ./test.sh testfiles/CommonMark_0.30/entity_and_numeric_character_references/004.test
Testfile testfiles/CommonMark_0.30/entity_and_numeric_character_references/004.test
Format templates/plain/
Template templates/plain/input.tex.m4
Command pdftex --shell-escape --interaction=nonstopmode test.tex
Command terminated with exit code 1.
*** test-expected.log 2022-12-21 14:16:46.763767021 +0000
--- test-actual.log 2022-12-21 14:16:50.043896037 +0000
***************
*** 1,6 ****
- documentBegin
- emphasis: (ampersand)nbsp (ampersand)x; (ampersand)(hash); (ampersand)(hash)x;
- emphasis: (ampersand)(hash)87654321;
- emphasis: (ampersand)(hash)abcdef0;
- emphasis: (ampersand)ThisIsNotDefined; (ampersand)hi?;
- documentEnd
--- 0 ----
This seems to be an issue of missing sanity checks in function entities.dec_entity()
(and likely also entities.hex_entity()
), which we use to convert HTML entities to Unicode characters:
# cat /tmp/*/test.markdown.err
...cal/texlive/texmf-local/tex/luatex/markdown/markdown.lua:2359: bad argument #1 to 'char' (invalid value)
# kpsewhich markdown.lua
/usr/local/texlive/texmf-local/tex/luatex/markdown/markdown.lua
# head -n 2363 `kpsewhich markdown.lua` | tail -n 6
function entities.dec_entity(s)
return unicode.utf8.char(tonumber(s))
end
function entities.hex_entity(s)
return unicode.utf8.char(tonumber("0x"..s))
end
We should check that tonumber(s)
is not nil
. However, we will also need a higher-level fix, so that the parser doesn't even try to convert the non-entity to a Unicode character to begin with. Here are the relevant PEG patterns.
See https://spec.commonmark.org/0.30/#example-28
No output is produced just as is Modified sometimes fails to be parsed with - markdown.lua:2359: bad argument # 1 to 'char' (invalid value)