Open nicholaides opened 2 months ago
Which shell are you using? (I don't think it's csh)
I see:
$ echo $'hello \0 world' | od -cb
0000000 h e l l o \n
150 145 154 154 157 040 012
0000007
In a bash and in a sh.
Be that as it may, the json.[ch]
we use doesn't handle zero bytes.
Oh, I forgot-- IIRC command line arguments on Mac/Linux are C-strings, so a \0
will terminate the string anyway.
But, jo still has this problem when a shell isn't involved, though. See:
% echo $'hello \0 world' > hw.txt
% od -cb hw.txt
0000000 h e l l o \0 w o r l d \n
150 145 154 154 157 040 000 040 167 157 162 154 144 012
0000016
% jo greeting=@hw.txt
{"greeting":"hello "}
In any case, if zero bytes aren't being handled correctly internally, then there's nothing that can be done, I guess.
It's a shame to not support UTF-8 correctly, though.
I'd still like to know which shell you're using:
$ echo $'hello \0 world' > hw.txt
$ od -cb hw.txt
0000000 h e l l o \n
150 145 154 154 157 040 012
0000007
zsh on MacOS Sonoma
% $SHELL --version
zsh 5.9 (x86_64-apple-darwin23.0)
zsh
That explains it. From the zshoptions
man page:
POSIX_STRINGS <K> <S>
This option affects processing of quoted strings. Currently it only affects the behaviour of null characters, i.e. character 0 in the portable character set corresponding to US ASCII.When this option is not set, null characters embedded within strings of the form
$'...'
are treated as ordinary characters. The entire string is maintained within the shell and output to files where necessary, although owing to restrictions of the library interface the string is truncated at the null character in file names, environment variables, or in arguments to external programs.When this option is set, the
$'...'
expression is truncated at the null character. Note that remaining parts of the same string beyond the termination of the quotes are not truncated.For example, the command line argument
a$'b\0c'd
is treated with the option off as the charactersa
,b
, null,c
,d
, and with the option on as the charactersa
,b
,d
.
As for dealing with embedded null characters, that's a massive change, involving adding the concept of counted strings all through the current C codebase. Given that zsh's default treatment of null characters seems to be an outlier rather than the norm among command-line shells, making the huge effort to support embedded nulls will inevitably lead to a different set of issues raised here, viz. "you can't get there (embedded nulls) from here (your current shell, except zsh)".
will inevitably lead to a different set of issues raised here, viz. "you can't get there (embedded nulls) from here (your current shell, except zsh)".
I get that.
I would make a few points, however:
At the very least, jo should complain instead of failing silently if a file included via @
contains a null byte. I assume implementing this wouldn't require reworking all of the internals string representations.
It would be nice if the following error message could specify that the error is because of a limitation of jo rather than the JSON being invalid.
% cat nb.json
{"nb": "\u0000"}
% jo fileContents=:nb.json
jo: Cannot decode JSON in file nb.json
My understanding of the JSON spec is that the null character (
\0
) is a perfectly cromulent character in an JSON string because JSON strings are UTF-8.A null character in a string apparently terminates the string in jo:
Other control control characters get escaped correctly:
It's not a problem with the shell handling
\0
because this works as expected: