jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
29.58k stars 1.54k forks source link

Change how adding objects works #515

Open stedolan opened 9 years ago

stedolan commented 9 years ago

jq's + operator adds two items of the same type together, by addition (numbers), concatenation (strings and arrays) or merging (objects). Null is an identity (that is, x + null is always x).

The current behaviour of merging objects is to take the key-value pairs of both objects and keep the right in case of collisions. See, for instance, Python's dict.update method.

I'm not sure how useful this operation turned out to be. Maybe it should be in the standard library somewhere, but it's perhaps not the ideal definition of object + object.

Consider this merge function instead: merging objects takes the key-value pairs of both objects, and in case of collisions recursively adds the values using the same algorithm.

Some examples of the proposed change:

jq -n '{"a": 10, "b": 5} + {"b": 1, "c": 7}
old: {"a": 10, "b": 1, "c": 7}
new: {"a": 10, "b": 6, "c": 7}

jq -n '{"a": [1], "b": [2]} + {"b": [3], "c":[4]}'
old: {"a": [1], "b": [3], "c": [4]}
new: {"a": [1], "b": [2,3], "c": [4]}

jq -n '{"a": [1]} + {"a": true}'
old: {"a": true}
new: error: can't add array and boolean

jq -n '{"a": {"b": 1, "c": 2}} + {"a": {"c": 3}}'
old: {"a": {"c": 3}}
new: {"a": {"b": 1, "c": 5}}

This is similar to the * operator implemented by #320, #321. However, it's a little ugly to have two very similar operators, and I don't think that * is an intuitive operator on objects (if I didn't know jq, I reckon it would be less of a leap to imagine adding objects than multiplying them).

Issues like #274 would be solved by this new addition operator, but it would break backwards compatibility with programs relying on the current object + object.

Thoughts / rants / flamewars?

wtlangford commented 9 years ago

Speaking of arcane syntax, that * is definitely unexpected. I'd not noticed it before. As much as I like having the ability to merge objects without having to manually specify the paths to be merged through some clever usage of reduce, I agree that breaking the old object + object behavior is a concern. Perhaps this for a merge/1 builtin?

def merge(obj2): 
  if (type == "object" and (obj2 | type) == "object") then
    reduce ((keys + (obj2 | keys)) | unique)[] as $key (.;
      if (has($key) and (obj2 | has($key))) then
        .[$key] |= (. | merge(obj2[$key]))
      elif (obj2 | has($key)) then
        .[$key] |= obj2[$key]
      else
        .
      end
    )
  elif (type == (obj2 | type)) then
    . + obj2
  else
    error("Cannot merge " + type + " and " + (obj2 | type))
  end
;

The difference between merge/1 and * is that merge/1 adds non-object entries. Arrays and strings append obj2 to the input. Numbers are added.

I've not heavily tested this, but it seems to work okay. I can't figure out how to handle booleans, though. We can't add them, so one's got to overwrite the other.

Thoughts?

pkoppstein commented 9 years ago

@stedolan wrote:

perhaps not the ideal definition of object + object.

There are many ways in which JSON objects can be combined/merged/melded. Changing jq's + to another particular variant won't change that fact, and I think jq's current definition is just fine as a building block. Stability is of course another reason to leave it alone.

One task which I think is common enough to be worthy of jq's support is migrating from a "relational" view of things to an "object" view.

For example, suppose we have two JSON objects as follows, corresponding to a row in a relational table:

{ "id": 123, "surname": "S", "son": "Son1", } { "id": 123, "son": "Son2"}

In this case, we want to merge the objects with the result:

{ "id": 123, "surname": "S", "son": ["Son1", "Son2"]}

Also, if the second incoming row also gave the same information about "surname", then we would want the same result assuming that the "suname" is just an "attribute" of the person with the given "id".

The following definitions achieve these goals and are perhaps worthy of further consideration:

# Combine two entities in an array-oriented fashion.
# If uniq is true, then pass the results of the following through unique:
# if both are arrays:  a + b 
# else if a is an array: a + [b]
# else if b is an array: [a] + b
# else [a, b]
def aggregate(a;b;uniq):
  if uniq then aggregate(a; b; false) | unique
  elif (a|type) == "array" then
    if (b|type) == "array" then a + b
    else a + [b]
    end
  else
    if (b|type) == "array" then [a] + b
    else [a, b]
    end
  end;

# Combine . with obj using aggregate/3 for shared keys whose values differ
def combine(obj):
  . as $in
  | reduce (obj|keys|.[]) as $key
      ($in;
       if .[$key] == obj[$key] then . 
       else setpath([$key];
                            aggregate( $in|getpath([$key]); obj|getpath([$key]); true ) )
       end ) ;

def demo:
    { "id": 123, "son": "Son1", "surname": "S"} as $row1
  | { "id": 123, "son": "Son2", "surname": "S"} as $row2
  |  $row1 | combine($row2 )
;

[EDIT: combine now uses aggregate/3 to make it commutative and associative.]

nicowilliams commented 9 years ago

I agree with @pkoppstein that there's many ways that one might want to combine/merge/add objects. We're not going to find a one-size-fits-all operator. We might find a one-size-fits-most operator.

I don't mind breaking backwards compatibility at some point, especially with import. We could make it so you import jq version 1.4; and so on to get the semantics (or error) that you want.

stedolan commented 9 years ago

If we were to break backwards compatibility, I'd do it slowly (detect uses of the behaviour we're going to change and warn for a version or so, then change in the next version).

@pkoppstein I see what you're going for there, but your code does implicit conversion of a to [a], something jq tries hard to avoid. This causes problems such as a lack of associativity (for instance, combining {"a": 1}, {"a":1} and {"a":2} gives different results ({"a": [1,2]} or {"a":[1,1,2]} depending on bracketing.

If someone wanted to do this kind of thing, a better approach would be first to map the values to one-element arrays, then use add/merge.

pkoppstein commented 9 years ago

@stedolan - Yes, the problem domain (relational tables) calls out for associativity and commutativity. I've amended combine and aggregate so that unique is used, though perhaps sort would be better? Thanks!

stedolan commented 9 years ago

I think your new version is in fact commutative and associative (but you forgot a |unique in aggregate), but the approach of silently converting non-arrays to arrays is still a problem.

For instance, the type of the result of combining elements of a list now depends on the length of the list, which seems dubious. Also, combineing {"a": 1} and {"a": 1} gives {"a": 1}, while aggregateing 1 and 1 gives [1].

I understand the motivation for this function, but it conflicts with jq's approach of having predictable output types and never implicitly converting types.

pkoppstein commented 9 years ago

@stedolan - Thanks for pointing out the missing | unique. It's now there.

The test for equality in combine could obviously be moved to aggregate (so far as combine is concerned) but I thought that a function called "aggregate" should "aggregate". If both versions of aggregate are deemed unfit as a "builtin" then it could be made into a subfunction of "combine".

To address your point about implcit conversion of types, combine could be modified so that the list of keys that are to be "aggregated" must be provided. Maybe that is what is really needed anyway, but I was interested in investigating whether automating the inferral of such a list might be useful.

If the list of fields to aggregate were to be made a required argument, would that make a difference in your assessment of the potential suitability of this kind of functionality for jq's collection of builtins?

nicowilliams commented 9 years ago

@stedolan Oddly enough I rely on current behavior. I'm willing to change the code in question.

joelpurra commented 9 years ago

@stedolan:

The current behaviour of merging objects is to take the key-value pairs of both objects and keep the right in case of collisions.

I like it as it is. My code rarely uses + for objects with collisions though. I find doing anything else but a if both has() use other check in case of a collision would be adding confusion.

There are other/more advanced ways to merge/add/verb two objects, but I'd rather see them as functions.