jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.23k stars 1.57k forks source link

jq --raw modes, --binary and newlines #3132

Open calestyo opened 4 months ago

calestyo commented 4 months ago

Describe the bug

Hey.

I think the following could/should be better documented. When using --raw-output, and e.g. selects a string, people might expect to get the value of that string, but in fact they do get the value + a newline.

This kind of makes some sense, of course, as one might return multiple values like in '.["foo"],.["bra"]'.

Now what happens if the values themselves contain quoted newlines \n or even \r\n?

It seems that without --binary, and when built for windows, jq would convert \n in the values to CRLF and also separate the entries with CRLF instead of LF.
I haven't checked whether it's smart enough to not add another CR, if it's already encoded as \r\n.

One can debate now, whether it should also touch the values, because that might be like a binary string, but ok.

At least, from the documentation of --binary it doesn't really get clear that it will transform both kinds of newlines, those that were part of a JSON value and those that jq uses or separation purposes.

So if that's even the desired behaviour, I mean that values have also their line encodings translated, it would be nice if that could be somehow explained, at least with the --binary-option, but perhaps even also with the --raw options, cause these simply sound as if the value would be raw, whereas - on Windows - it isn't.

Similarly, for --raw-output0, it doesn't become clear whether it's just the separator newlines (that separate multiple values), which are transformed, or also the newlines that are encoded within values (e.g. strings).
I'd say that especially for --raw-output0 it would probably make actually sense to only have it affect the multi-value separating newlines and not the ones within a value.

Thanks, Chris.