Open SOF3 opened 2 months ago
Meanwhile, \x22\x00\x22
("
parse error: Unfinished string at EOF at line 1, column 1
src/lexer.l
is the jq lexer; not the json lexer
jq 1.6 is an old version; I tried your example and I get a parse error:
$ printf '1\r\x00\n\x00\n1\n\x00 \x00' | jq
1
jq: parse error: Invalid numeric literal at line 2, column 0
So, if NUL is supposed to be whitespace as you are saying (have not checked), that is wrong; but it does not return 0
for the NULs.
Meanwhile,
\x22\x00\x22
("") reports the following error, which appears to suggest that null bytes in general should not be allowed: parse error: Unfinished string at EOF at line 1, column 1
@SOF3 That is just standard JSON as specified in https://json.org
You cannot have literal ASCII control characters (with the exception of DEL U+007f; mentioned in the rfc) in JSON strings.
But the parser does seem to get confused by NUL when it is used as whitespace in the input:
$ printf '1\0 2 ' | jq # stops parsing after NUL
1
$ printf '1\0 2\n' | jq # treats NUL as whitespace
1
2
$ printf '1\r\x00\n\x00\n1\n\x00 \x00' | jq
1
jq: parse error: Invalid numeric literal at line 2, column 0
$ printf '1\x00\n\x00\n1\n\x00 \x00' | jq
1
jq: parse error: Invalid numeric literal at line 3, column 0
$ printf '1\x00\x00\n1\n\x00 \x00' | jq
1
1
Describe the bug A clear and concise description of what the bug is.
Whitespace-delimited NUL bytes are sometimes parsed as zero values but sometimes not.
To Reproduce Provide a minimal test case to reproduce the behavior. If the input is large, either attach it as a file, or create a gist and link to it here.
(Btw, U+000D is a valid whitespace character according to RFC 8259, but does not seem to be included in the lexer. I am not familiar with flex so I don't know if there's some magic going on there)
https://github.com/jqlang/jq/blob/ed8f7154f4e3e0a8b01e6778de2633aabbb623f8/src/lexer.l#L133
Expected behavior A clear and concise description of what you expected to happen.
To be honest, I don't know what to expect for null bytes, but I would expect them to be something more consistent.
RFC 8259 does not permit NUL bytes as input, so it is reasonable (although probably unnecessary) to treat them, when outside string literals, either as invalid characters or whitespace. But magically creating a Number(0) value does not look right.
Environment (please complete the following information):
Additional context Add any other context about the problem here.