jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.52k stars 1.58k forks source link

jq should have an option not to sort keys #1364

Closed davidfetter closed 7 years ago

davidfetter commented 7 years ago

OK, so not sorting by default (#169) may not have legs on performance grounds or something.

How about an option to preserve order instead?

mariotti commented 7 years ago

I agree fully to the problem but half way to the approach.

I just had a very simple discussion here: http://stackoverflow.com/questions/43148797/jq-how-to-add-an-object-key-value-in-a-nested-json-tree-with-arrays/43149353#43149353

I think the problem would be like: jq as a pipe command or jq as a programming tool to perform json massages? Or: Is "diff" the correct command to check if 2 json files are the same? [anybody planning a jdiff tool]

There is the theory which assumes that order is not persistent. There is also the practice which often (wrongly) works with assumed order.

Before jq I had this: https://github.com/mariotti/technical_interview_questions/blob/master/bin/generateQmd.sh

Now it is:

# JQ Alternative: testing
cat QUESTIONS.json | jq '.' | jq -r \
    '.TechQuestions.category[] |
    ( (.catname | "\n#Category: \(.)"),
      (.question[] | ( 
        (.title  | "\n## \(.)" ) , 
        (.ID    | "\n    ID: \(.)"),
        (.idCQ  | "    idCQ: \(.)"),
        (.idQ   | "    idQ: \(.)"),
        (.idC   | "    idC: \(.)"),
        (.notes | "\n notes: \(.)")
        )
      )
    )'

BUT. Order is important. (btw the code above assumes strict order for markdown)

It would mean that += has to match a kind of "append" and not a generic add to JSON data. For example.

Is jq ready for these distinctions?

nicowilliams commented 7 years ago
  -S               sort keys of objects on output;
nicowilliams commented 7 years ago

By default jq preserves key insertion order in objects. If you use the -S command-line option it will sort object keys.

nicowilliams commented 7 years ago

If I misunderstood the question feel free to re-open this issue and clarify.

nicowilliams commented 7 years ago

I should also add that JSON (RFC7159) says does not specify a sort order for keys.

There are multiple possible choices of collations, and jq does not know anything about them, nor does it know about Unicode normalization forms. So it is bound to be the case that what jq does for sorting objects (when requested) is not appropriate in some context.

pkoppstein commented 7 years ago

@nicowilliams - One of the issues here is illustrated by the following:

$ jq --version
jq-1.5rc2-228-g18753cb

$ jq -n '{b:2,a:1} | walk(if type == "object" then .a += 2 else . end)'
{
  "a": 3,
  "b": 2
}

This issue could be resolved by tweaking def walk to use keys_unsorted

nicowilliams commented 7 years ago

@pkoppstein Yes.

I do think that we should change keys to be the same as keys_unsorted. If you want sorted keys then use keys|sort, no?

mariotti commented 7 years ago

Yes, I agree with the RFC (the theory). But in the practice time to time is useful to preserve the original order. For example the '-S' will indeed sort (re-sort) the original file.

Maybe to be more specific. Consider this.

(you can get the file here: https://github.com/mariotti/technical_interview_questions/blob/master/QUESTIONS.json)

Sol1:

cat QUESTIONS.json | jq '.TechQuestions.category[].question[] += {"codefile" : "to configure"}' > x.1.lhs

Sol2:

cat QUESTIONS.json | jq '
# Apply f to composite entities recursively, and to atoms
def walk(f):
 . as $in
 | if type == "object" then
      reduce keys[] as $key
        ( {}; . + { ($key):  ($in[$key] | walk(f)) } ) | f
  elif type == "array" then map( walk(f) ) | f
  else f
  end;
(. |= walk( if type == "object" and has("question")
      then .question[] += {"codefile" : "to configure"}
      else .
     end))' > x.2.walk

They solve the same problem but the 2 final files are different:

diff x.1.lhs x.2.walk | head -4
32d31
<             "title": "Given 2 integer arrays, determine if the 2nd array is a rotated version of the 1st array.",
33a33
>             "title": "Given 2 integer arrays, determine if the 2nd array is a rotated version of the 1st array.",

This is indeed perfectly fine with JSON RFC.

Doing the same with the '-S' option, as expected, produces no diff for the output but the "original" file indeed is pretty different.

The feature of preserving the order would only be useful for "some" debugging, but having in mind that JSON is not sorted probably is the best solution.

For example a quick search gave me this:

https://github.com/andreyvit/json-diff

Which indeed is a way better tool for json then a "by pretty printing line" diff.

From my side, I am happy with the discussion and no need to reopen.

Thanks a lot!

nicowilliams commented 7 years ago

@mariotti You're welcome!

Preserving order will never be off internally because to turn it off internally would add branches or indirections we don't yet care for. Sorting keys is just about a) how keys are listed by the keys builtin, and b) how objects are printed on output.

BTW, diff'ing trees is not easy. XML has it easier because you could help by assigning IDs to nodes that make it easier to detect semantic moves.

mariotti commented 7 years ago

Just for completeness of these notes. And also because the tool I suggested (json-diff) doesn't really work OOTB (at least for me currently on OS X). About diffing files... I found this discussion: http://stackoverflow.com/questions/31930041/using-jq-or-alternative-command-line-tools-to-diff-json-files

these are few diff propositions which are actually using jq and the -S option.

Also these two packages:

http://json-delta.readthedocs.io/en/latest/#downloads

A python, or javascript and ... implementation (the python implementation seems kinda slow..)

And pretty equivalent but in golang:

https://github.com/josephburnett/jd

I did only quick tests with the data above and at least they run ;)

Note that the last 2 propose also a patching tool.

Hope it helps if people are stepping in this issue ;)

..and open a suggestion: a fully nicely working jqdiff?