jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
29.59k stars 1.54k forks source link

Null bytes are handled inconsistently #3110

Open SOF3 opened 2 months ago

SOF3 commented 2 months ago

Describe the bug A clear and concise description of what the bug is.

Whitespace-delimited NUL bytes are sometimes parsed as zero values but sometimes not.

To Reproduce Provide a minimal test case to reproduce the behavior. If the input is large, either attach it as a file, or create a gist and link to it here.

$ for cmd in xxd jq; do printf '1\r\x00\n\x00\n1\n\x00 \x00' | $cmd; done
00000000: 310d 000a 000a 310a 0020 00              1.....1.. .
1
0
0
1

(Btw, U+000D is a valid whitespace character according to RFC 8259, but does not seem to be included in the lexer. I am not familiar with flex so I don't know if there's some magic going on there)

https://github.com/jqlang/jq/blob/ed8f7154f4e3e0a8b01e6778de2633aabbb623f8/src/lexer.l#L133

Expected behavior A clear and concise description of what you expected to happen.

To be honest, I don't know what to expect for null bytes, but I would expect them to be something more consistent.

RFC 8259 does not permit NUL bytes as input, so it is reasonable (although probably unnecessary) to treat them, when outside string literals, either as invalid characters or whitespace. But magically creating a Number(0) value does not look right.

Environment (please complete the following information):

$ jq --version
jq-1.6

Additional context Add any other context about the problem here.

SOF3 commented 2 months ago

Meanwhile, \x22\x00\x22 (" ") reports the following error, which appears to suggest that null bytes in general should not be allowed:

parse error: Unfinished string at EOF at line 1, column 1

emanuele6 commented 2 months ago

src/lexer.l is the jq lexer; not the json lexer

emanuele6 commented 2 months ago

jq 1.6 is an old version; I tried your example and I get a parse error:

$ printf '1\r\x00\n\x00\n1\n\x00 \x00' | jq
1
jq: parse error: Invalid numeric literal at line 2, column 0

So, if NUL is supposed to be whitespace as you are saying (have not checked), that is wrong; but it does not return 0 for the NULs.

emanuele6 commented 2 months ago

Meanwhile, \x22\x00\x22 (" ") reports the following error, which appears to suggest that null bytes in general should not be allowed:

parse error: Unfinished string at EOF at line 1, column 1

@SOF3 That is just standard JSON as specified in https://json.org

You cannot have literal ASCII control characters (with the exception of DEL U+007f; mentioned in the rfc) in JSON strings.

emanuele6 commented 2 months ago

But the parser does seem to get confused by NUL when it is used as whitespace in the input:

$ printf '1\0 2 ' | jq      # stops parsing after NUL
1
$ printf '1\0 2\n' | jq     # treats NUL as whitespace
1
2
$ printf '1\r\x00\n\x00\n1\n\x00 \x00' | jq
1
jq: parse error: Invalid numeric literal at line 2, column 0
$ printf '1\x00\n\x00\n1\n\x00 \x00' | jq
1
jq: parse error: Invalid numeric literal at line 3, column 0
$ printf '1\x00\x00\n1\n\x00 \x00' | jq
1
1