jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
29.59k stars 1.54k forks source link

Using unique or unique_by in a filter causes subsequent (after comma ,) filter inputs to be replaced with its output #3139

Closed tonymitchell closed 3 weeks ago

tonymitchell commented 4 weeks ago

Describe the bug Using unique in a filter appears to cause its output to replace the input of any subsequent filters that follows it (after comma) , rather than affecting that filter only.

Referring to the documentation for unique, it makes no mention of affecting the input of subsequent filters.

To Reproduce

Given this sample input:

{
    "List1": [
        { "value": "1.2"}
    ],
    "List2": [
        { "name": "serviceA" },
        { "name": "serviceA" },
        { "name": "serviceB" },
        { "name": "serviceB" }
    ]
}

Using the unique function in the first filter, followed with a filter that expects the original input, will result in an error.

For example, the following command fails with the following error: jq: error (at input.json:10): Cannot index array with string "List1" jq '[.List2[].name]|unique, .List1' input.json

Without the unique function, the command executes without error: jq '[.List2[].name], .List1' input.json

Changing the order of the filters results in a successful execution of the two filters: jq '.List1, [.List2[].name]|unique' input.json

Expected behavior Filters separated by a comma should each receive the original input, and not have the input affected by the other.

This is supported by the documention on the comma (,), which states: "If two filters are separated by a comma, then the same input will be fed into both and the two filters' output value streams will be concatenated in order"

Environment (please complete the following information): Ubuntu 22.04.04 LTS under WSL2 on Windows 11. jq version 1.6

itchyny commented 4 weeks ago

Since the precedence of | is lower than ,, your query is parsed as [.List2[].name] | (unique, .List1). So use ([.List2[].name] | unique), .List1.

tonymitchell commented 3 weeks ago

Thank you, that helps me understand why it isn't working.

As some one new to the language, that does leave me with a few additional questions that I hope you'll indulge:

  1. Is that precedence behaviour documented somewhere? I can't see anything in the documentation about operators having different precedence (or the ordering among them) other than a side comment about it affecting the // operator.
  2. Given the function of the comma operator being to separate filters, wouldn't it make sense for it to have one of the lowest precedences and be lower than |? Is there a reason for that choice that only becomes apparent later? I'm assuming it's too late now to adjust that now.
  3. Does that mean any time you use the comma operator with non-trivial filters, you'll always have to wrap them in parentheses ()? If that is so, it might be helpful to add a comment to that effect in the documentation for the comma operator to help new learners.
wader commented 3 weeks ago

Thank you, that helps me understand why it isn't working.

As some one new to the language, that does leave me with a few additional questions that I hope you'll indulge:

  1. Is that precedence behaviour documented somewhere? I can't see anything in the documentation about operators having different precedence (or the ordering among them) other than a side comment about it affecting the // operator.

Only in the wiki atm https://github.com/jqlang/jq/wiki/jq-Language-Description#operators-priority but i agree maybe it could be expanded on a bit in the documentation.

  1. Given the function of the comma operator being to separate filters, wouldn't it make sense for it to have one of the lowest precedences and be lower than |? Is there a reason for that choice that only becomes apparent later? I'm assuming it's too late now to adjust that now.
  2. Does that mean any time you use the comma operator with non-trivial filters, you'll always have to wrap them in parentheses ()? If that is so, it might be helpful to add a comment to that effect in the documentation for the comma operator to help new learners.

I think myself that the current precedence ordering makes sense, buy maybe i'm just used to it :) ex:

# with current order
$ jq -cn '1, 2 | .*2'
2
4
# if , and | order were swapped
$ jq -cn '1, (2 | .*2)'
1
4

or do i misunderstand? could you give some examples? ...and yeap it's probably way to late to change things now.

itchyny commented 3 weeks ago

Another common use case we rely on the current precedence; .articles[] | .id, .name, .description.

tonymitchell commented 3 weeks ago

Yes, I can see the usefulness of the current precedence for those inline "join" and "split" type scenarios, which I wasn't considering with my focus on my specific use case that required two independent filters from the same source input (headers and rows for a CSV output).

Thanks for your help.