Open stedolan opened 9 years ago
Speaking of arcane syntax, that *
is definitely unexpected. I'd not noticed it before.
As much as I like having the ability to merge objects without having to manually specify the paths to be merged through some clever usage of reduce
, I agree that breaking the old object +
object behavior is a concern.
Perhaps this for a merge/1
builtin?
def merge(obj2):
if (type == "object" and (obj2 | type) == "object") then
reduce ((keys + (obj2 | keys)) | unique)[] as $key (.;
if (has($key) and (obj2 | has($key))) then
.[$key] |= (. | merge(obj2[$key]))
elif (obj2 | has($key)) then
.[$key] |= obj2[$key]
else
.
end
)
elif (type == (obj2 | type)) then
. + obj2
else
error("Cannot merge " + type + " and " + (obj2 | type))
end
;
The difference between merge/1
and *
is that merge/1
adds non-object entries. Arrays and strings append obj2
to the input. Numbers are added.
I've not heavily tested this, but it seems to work okay. I can't figure out how to handle booleans, though. We can't add them, so one's got to overwrite the other.
Thoughts?
@stedolan wrote:
perhaps not the ideal definition of object + object.
There are many ways in which JSON objects can be combined/merged/melded. Changing jq's +
to another particular variant won't change that fact, and I think jq's current definition is just fine as a building block. Stability is of course another reason to leave it alone.
One task which I think is common enough to be worthy of jq's support is migrating from a "relational" view of things to an "object" view.
For example, suppose we have two JSON objects as follows, corresponding to a row in a relational table:
{ "id": 123, "surname": "S", "son": "Son1", } { "id": 123, "son": "Son2"}
In this case, we want to merge the objects with the result:
{ "id": 123, "surname": "S", "son": ["Son1", "Son2"]}
Also, if the second incoming row also gave the same information about "surname", then we would want the same result assuming that the "suname" is just an "attribute" of the person with the given "id".
The following definitions achieve these goals and are perhaps worthy of further consideration:
# Combine two entities in an array-oriented fashion.
# If uniq is true, then pass the results of the following through unique:
# if both are arrays: a + b
# else if a is an array: a + [b]
# else if b is an array: [a] + b
# else [a, b]
def aggregate(a;b;uniq):
if uniq then aggregate(a; b; false) | unique
elif (a|type) == "array" then
if (b|type) == "array" then a + b
else a + [b]
end
else
if (b|type) == "array" then [a] + b
else [a, b]
end
end;
# Combine . with obj using aggregate/3 for shared keys whose values differ
def combine(obj):
. as $in
| reduce (obj|keys|.[]) as $key
($in;
if .[$key] == obj[$key] then .
else setpath([$key];
aggregate( $in|getpath([$key]); obj|getpath([$key]); true ) )
end ) ;
def demo:
{ "id": 123, "son": "Son1", "surname": "S"} as $row1
| { "id": 123, "son": "Son2", "surname": "S"} as $row2
| $row1 | combine($row2 )
;
[EDIT: combine
now uses aggregate/3
to make it commutative and associative.]
I agree with @pkoppstein that there's many ways that one might want to combine/merge/add objects. We're not going to find a one-size-fits-all operator. We might find a one-size-fits-most operator.
I don't mind breaking backwards compatibility at some point, especially with import
. We could make it so you import jq version 1.4;
and so on to get the semantics (or error) that you want.
If we were to break backwards compatibility, I'd do it slowly (detect uses of the behaviour we're going to change and warn for a version or so, then change in the next version).
@pkoppstein I see what you're going for there, but your code does implicit conversion of a
to [a]
, something jq tries hard to avoid. This causes problems such as a lack of associativity (for instance, combining {"a": 1}
, {"a":1}
and {"a":2}
gives different results ({"a": [1,2]}
or {"a":[1,1,2]}
depending on bracketing.
If someone wanted to do this kind of thing, a better approach would be first to map the values to one-element arrays, then use add/merge.
@stedolan - Yes, the problem domain (relational tables) calls out for associativity and commutativity. I've amended combine
and aggregate
so that unique
is used, though perhaps sort
would be better? Thanks!
I think your new version is in fact commutative and associative (but you forgot a |unique
in aggregate), but the approach of silently converting non-arrays to arrays is still a problem.
For instance, the type of the result of combining elements of a list now depends on the length of the list, which seems dubious. Also, combine
ing {"a": 1}
and {"a": 1}
gives {"a": 1}, while aggregate
ing 1 and 1 gives [1]
.
I understand the motivation for this function, but it conflicts with jq's approach of having predictable output types and never implicitly converting types.
@stedolan - Thanks for pointing out the missing | unique
. It's now there.
The test for equality in combine
could obviously be moved to aggregate
(so far as combine
is concerned) but I thought that a function called "aggregate" should "aggregate". If both versions of aggregate
are deemed unfit as a "builtin" then it could be made into a subfunction of "combine".
To address your point about implcit conversion of types, combine
could be modified so that the list of keys that are to be "aggregated" must be provided. Maybe that is what is really needed anyway, but I was interested in investigating whether automating the inferral of such a list might be useful.
If the list of fields to aggregate were to be made a required argument, would that make a difference in your assessment of the potential suitability of this kind of functionality for jq's collection of builtins?
@stedolan Oddly enough I rely on current behavior. I'm willing to change the code in question.
@stedolan:
The current behaviour of merging objects is to take the key-value pairs of both objects and keep the right in case of collisions.
I like it as it is. My code rarely uses +
for objects with collisions though. I find doing anything else but a if both has() use other
check in case of a collision would be adding confusion.
There are other/more advanced ways to merge/add/verb two objects, but I'd rather see them as functions.
jq's
+
operator adds two items of the same type together, by addition (numbers), concatenation (strings and arrays) or merging (objects). Null is an identity (that is,x + null
is alwaysx
).The current behaviour of merging objects is to take the key-value pairs of both objects and keep the right in case of collisions. See, for instance, Python's
dict.update
method.I'm not sure how useful this operation turned out to be. Maybe it should be in the standard library somewhere, but it's perhaps not the ideal definition of object + object.
Consider this merge function instead: merging objects takes the key-value pairs of both objects, and in case of collisions recursively adds the values using the same algorithm.
Some examples of the proposed change:
This is similar to the
*
operator implemented by #320, #321. However, it's a little ugly to have two very similar operators, and I don't think that*
is an intuitive operator on objects (if I didn't know jq, I reckon it would be less of a leap to imagine adding objects than multiplying them).Issues like #274 would be solved by this new addition operator, but it would break backwards compatibility with programs relying on the current object + object.
Thoughts / rants / flamewars?