jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.38k stars 1.57k forks source link

No way to slurp multiple raw inputs without joining them #2415

Open lilyball opened 2 years ago

lilyball commented 2 years ago

Describe the bug jq can read raw input as strings with --raw-input. This splits the input on lines though, but it can use --slurp --raw-input to read the entire input as one string.

It also offers a way to work with multiple inputs via the input and inputs builtins.

Unfortunately there's no way to combine the two such that I can slurp raw input for each input separately. If I use just -R I get separate inputs but they're split on lines (and trailing newlines are ignored). If I use -Rs I get the entire input as a string, but multiple inputs are joined together into the same string.

To Reproduce A trivial example of what I'd like to do is take multiple inputs and produce a JSON object of each filename mapped to its base64'd contents. My best attempt looks like

jq -nR 'reduce inputs as $i ({}; .[input_filename] = ($i | @base64))' foo bar baz

This works if each file has no newline, but the moment I have newlines in a file, this breaks down. And adding the -s switch breaks the reduce such that inputs just yields a single string with the combined contents of all files (and input_filename returns the final filename).

Expected behavior I initially expected -nRs to still process each named input separately. Since it doesn't, and it would be backwards-incompatible to change that, I'd like to see a new flag that says "slurp but still keep each input separate".

Alternatively, I'd like an equivalent to the --slurpfile flag that adds the behavior of --raw-input (since the --raw-input flag does not currently affect --slurpfile and again it would be backwards-incompatible to change that). Maybe call it --rawfile? That would be the easiest to work with because then I don't need to worry about the behavior of inputs and I can handle input from files identically to passing non-file values with --arg.

Environment (please complete the following information):

Additional context I can't just use --arg for this because that will fail if the file contains NUL bytes.

wader commented 2 years ago

Nice bug report 👍 There is a --rawfile argument in master and it seems to be available in 1.6 also. But one issue with using that is that you will probably have to list all files explicitly, don't see how it can be used with simple globbing etc.

$ echo -e '{"a":\n"aaa"\n}' > a
$ echo -e '{"b":\n"bbb"\n}' > b
$ echo -e 'c\x00c' > c
$ jq -n --rawfile a a --rawfile b b --rawfile c c '{$a, $b, $c} | with_entries(.value |= @base64)' | tee /dev/stderr | jq 'with_entries(.value |= @base64d)'
{
  "a": "eyJhIjoKImFhYSIKfQo=",
  "b": "eyJiIjoKImJiYiIKfQo=",
  "c": "YwBjCg=="
}
{
  "a": "{\"a\":\n\"aaa\"\n}\n",
  "b": "{\"b\":\n\"bbb\"\n}\n",
  "c": "c\u0000c\n"
}

Otherwise i'm out of ideas, will think about it, nice with some jq-golf :)

lilyball commented 2 years ago

Oh you're right, jq -h lists --rawfile but the manual doesn't. This perfectly solves my current problem as I'm happy to list them explicitly (I want to basename the keys anyway and this way I don't have to define that in jq itself), though I can still imagine where a new flag that slurps input files independently would be useful.

wader commented 2 years ago

Aha good! yeah agree, I've been bitten by the same behaviour also. And also agree some kind of --separate-slurps or --no-inputs-concat flag etc is probably the least confusing way to achieve it.

pkoppstein commented 2 years ago

@lilyball - For the record, the online manual for v1.6 (https://stedolan.github.io/jq/manual/v1.6/) mentions --rawfile.

lilyball commented 2 years ago

Huh, now I wonder what other differences there are between the online manual and my locally-installed version?

T3sT3ro commented 7 months ago

As far as I can tell, the only simple workaround for the original problem requires 2 invocations:

... | jq -R | jq -s

A pity, that there is no R+s mode :/

wader commented 7 months ago

@T3sT3ro not ideal but you can do jq -nR '[inputs] | ...' if you want to skip two jq invocations