jpmens / jo

JSON output from a shell
http://jpmens.net/2016/03/05/a-shell-command-to-create-json-jo/
Other
4.68k stars 156 forks source link

Null character (i.e. "\0") terminates string, but should actually be escaped instead #212

Open nicholaides opened 2 months ago

nicholaides commented 2 months ago

My understanding of the JSON spec is that the null character (\0) is a perfectly cromulent character in an JSON string because JSON strings are UTF-8.

A null character in a string apparently terminates the string in jo:

% jo greeting=$'hello \0 world'
{"greeting":"hello "}

Other control control characters get escaped correctly:

% jo greeting=$'hello \1 world'
{"greeting":"hello \u0001 world"}

It's not a problem with the shell handling \0 because this works as expected:

% echo $'hello \0 world'
hello  world
jpmens commented 2 months ago

Which shell are you using? (I don't think it's csh)

I see:

$ echo $'hello \0 world' | od -cb
0000000    h   e   l   l   o      \n
          150 145 154 154 157 040 012
0000007

In a bash and in a sh.

Be that as it may, the json.[ch] we use doesn't handle zero bytes.

nicholaides commented 2 months ago

Oh, I forgot-- IIRC command line arguments on Mac/Linux are C-strings, so a \0 will terminate the string anyway.

But, jo still has this problem when a shell isn't involved, though. See:

% echo $'hello \0 world' > hw.txt
% od -cb hw.txt      
0000000    h   e   l   l   o      \0       w   o   r   l   d  \n        
          150 145 154 154 157 040 000 040 167 157 162 154 144 012        
0000016
% jo greeting=@hw.txt
{"greeting":"hello "}

In any case, if zero bytes aren't being handled correctly internally, then there's nothing that can be done, I guess.

It's a shame to not support UTF-8 correctly, though.

jpmens commented 2 months ago

I'd still like to know which shell you're using:

$ echo $'hello \0 world' > hw.txt
$ od -cb hw.txt
0000000    h   e   l   l   o      \n
          150 145 154 154 157 040 012
0000007
nicholaides commented 2 months ago

zsh on MacOS Sonoma

% $SHELL --version
zsh 5.9 (x86_64-apple-darwin23.0)
gromgit commented 1 month ago

zsh

That explains it. From the zshoptions man page:

POSIX_STRINGS <K> <S> This option affects processing of quoted strings. Currently it only affects the behaviour of null characters, i.e. character 0 in the portable character set corresponding to US ASCII.

When this option is not set, null characters embedded within strings of the form $'...' are treated as ordinary characters. The entire string is maintained within the shell and output to files where necessary, although owing to restrictions of the library interface the string is truncated at the null character in file names, environment variables, or in arguments to external programs.

When this option is set, the $'...' expression is truncated at the null character. Note that remaining parts of the same string beyond the termination of the quotes are not truncated.

For example, the command line argument a$'b\0c'd is treated with the option off as the characters a, b, null, c, d, and with the option on as the characters a, b, d.

As for dealing with embedded null characters, that's a massive change, involving adding the concept of counted strings all through the current C codebase. Given that zsh's default treatment of null characters seems to be an outlier rather than the norm among command-line shells, making the huge effort to support embedded nulls will inevitably lead to a different set of issues raised here, viz. "you can't get there (embedded nulls) from here (your current shell, except zsh)".

nicholaides commented 1 month ago

will inevitably lead to a different set of issues raised here, viz. "you can't get there (embedded nulls) from here (your current shell, except zsh)".

I get that.

I would make a few points, however:

  1. At the very least, jo should complain instead of failing silently if a file included via @ contains a null byte. I assume implementing this wouldn't require reworking all of the internals string representations.

  2. It would be nice if the following error message could specify that the error is because of a limitation of jo rather than the JSON being invalid.

% cat nb.json
{"nb": "\u0000"}

% jo fileContents=:nb.json
jo: Cannot decode JSON in file nb.json