jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.05k stars 1.55k forks source link

Outer join-like behavior when merging arrays of objects with same key values #1090

Closed marzolfb closed 8 years ago

marzolfb commented 8 years ago

I have two asks here:

  1. Is there a "simpler" way to do what I detail below?
  2. Is this an example that would be helpful to include on the cookbook wiki page?

The problem: I have two arrays of objects both with the same identifying key values within objects. I want a result that will include all the objects from the "driving array" with the enhanced detail in objects from the other array (This feels very much like an outer join in SQL parlance)

This is a contrived example that attempts to illustrate the problem:

items.json:

[
  {"id": "A", "name": "Ant"},
  {"id": "B", "name": "Butterfly"},
  {"id": "C", "name": "Caterpillar"}
]

item_details.json:

[
  {"id": "A", "speciesCount": 12000},
  {"id": "C", "speciesCount": 20000}
]

I want a result that looks like this:

[
  {
    "id": "A",
    "name": "Ant",
    "speciesCount": 12000
  },
  {
    "id": "B",
    "name": "Butterfly"
  },
  {
    "id": "C",
    "name": "Caterpillar",
    "speciesCount": 20000
  }
]

My first attempt at solving this was to use select but I had no luck with that. After trying a number of things I finally wound up with this:

filter:

.[0] as $items | .[1] as $itemDetails
|
$items
| map ( . as $item |
  {
    id: .id,
    name: .name,
  }
  +
  ($itemDetails
    | map(if .id == $item.id then {speciesCount: .speciesCount} else empty end)
    | if length == 0 then {} else .[0] end
  )
)

And I executed this with this command:

jq -s -r -f filter items.json item_details.json
pkoppstein commented 8 years ago

Here is a simple definition of "outerjoin" that can be used to accomplish the task. This definition, however, trades robustness for efficiency. That is, the results may be unexpected if the values of id are flakey. Specifically, it is assumed that the string value at the given keyname ("id" in your case) should be taken as the object's identity.

# Create the outer-join (an array) of the objects in the input array;
# keyname must be a string.
def outerjoin(keyname):
  [reduce .[] as $o ({}; .[$o[keyname] | tostring] += $o ) | .[]];

Here's your example:

def itemsjson:
[
  {"id": "A", "name": "Ant"},
  {"id": "B", "name": "Butterfly"},
  {"id": "C", "name": "Caterpillar"}
];

def item_detailsjson:
[
  {"id": "A", "speciesCount": 12000},
  {"id": "C", "speciesCount": 20000}
];

itemsjson + item_detailsjson | outerjoin("id")

Typescript:

$ jq -n -f outerjoin.jq
[
  {
    "id": "A",
    "name": "Ant",
    "speciesCount": 12000
  },
  {
    "id": "B",
    "name": "Butterfly"
  },
  {
    "id": "C",
    "name": "Caterpillar",
    "speciesCount": 20000
  }
]
marzolfb commented 8 years ago

Thank you! I like that much better. There are some bits of syntax in your example that I didn't realize you could use.

pkoppstein commented 8 years ago

@marzolfb - A safer and more generic "outer join" can be implemented like so:

def add_by(f): group_by(f) | map(add);

In your case, you'd write: add_by(.id)

One nice thing about this approach is that the argument (f) can be any filter, and thus compound keys can easily be handled; e.g., if the fields "k1" and "k2" are a compound key, one would write:

add_by( [.k1, .k2] )

Since add_by as defined here uses group_by, a sort is entailed. It's probably not already in jq because it's so trivial.

marzolfb commented 8 years ago

Cool! Even better! I like the flexibility to allow for compound keys. Thanks again!

marzolfb commented 8 years ago

@pkoppstein - On second thought, I don't think my contrived example reflects what I'm trying to do. Let's adjust my original scenario and say that item.details.json has an additional item keyed with an .id value of "D" but that the items.json file doesn't contain this key. Like this:

def itemsjson:
[
  {"id": "A", "name": "Ant"},
  {"id": "B", "name": "Butterfly"},
  {"id": "C", "name": "Caterpillar"}
];

def item_detailsjson:
[
  {"id": "A", "speciesCount": 12000},
  {"id": "C", "speciesCount": 20000},
  {"id": "D", "speciesCount": 400}
];

I still want a result that contains all the items from itemsjson regardless of whether they are in item_detailsjson BUT I don't wan't to include things that are in item_detailsjson and are not in itemsjson. So, given the input above, I still want to see a result like this:

[
  {
    "id": "A",
    "name": "Ant",
    "speciesCount": 12000
  },
  {
    "id": "B",
    "name": "Butterfly"
  },
  {
    "id": "C",
    "name": "Caterpillar",
    "speciesCount": 20000
  }
]

The two solutions you provide don't address this scenario. Sorry for not getting the use case right the first time.

pkoppstein commented 8 years ago

Once again, for efficiency, the following assumes that the ids are well-behaved:

def elaborate( f; details ):
  def dict(f):
    reduce .[] as $o ({}; .[$o | f | tostring] = $o ) ;
  dict(f) as $ids
  | . + [ details[] | select( f in($ids)) ] 
  | group_by(f)
  | map(add);

Example usage:

itemsjson | elaborate( .id; item_detailsjson )

Output:

[
  {
    "id": "A",
    "name": "Ant",
    "speciesCount": 12000
  },
  {
    "id": "B",
    "name": "Butterfly"
  },
  {
    "id": "C",
    "name": "Caterpillar",
    "speciesCount": 20000
  }
]
marzolfb commented 8 years ago

Very nice. Thanks again!

DanielChristow commented 6 years ago

This last use case by marzolfb is exactly what I'm up against right now. Looking at the script from pkoppstein, and the output, I'm running into some issues running it all together.

This is most likely 100% me, but could someone help me with getting this going? I'm very new to JQ. I'm having trouble running jq with a filter from a file on ubuntu.

Also, is this filter still the best way of achieving the goal?

def elaborate( f; details ):
  def dict(f):
    reduce .[] as $o ({}; .[$o | f | tostring] = $o ) ;
  dict(f) as $ids
  | . + [ details[] | select( f in($ids)) ] 
  | group_by(f)
  | map(add);
pkoppstein commented 6 years ago

@DanielChristow - It's hard to know how to help since you haven't described the problem.

You might like to use the following variant of elaborate: it has slightly different semantics but is simpler and does not use group_by:

# input: an array of items to be elaborated
# f: a key into these items
# details: additional details, also keyed by f
# output: an array of the elaborated input items
def elaborate( f; details ):
  def dict(f):
    reduce .[] as $o ({}; .[$o | f | tostring] = $o ) ;
  (details|dict(f)) as $dict
  | map(. + $dict[f]);
DanielChristow commented 6 years ago

You're totally right, I was being super vague. Here's what I'm trying: Bash, ubuntu:

$jq -s 'include "elaborate"; itemsjson | elaborate(.id; item_detailsjson)'
jq: error: syntax error, unexpected IDENT, expecting ';' or ')' (Unix shell quoting issues?) at /opt/GLAS_STB_util_report/elaborate.jq, line 5:
  | . + [ details[] | select( f in($ids)) ]                                 
jq: 1 compile error

Here's what elaborate.jq looks like:

def elaborate( f; details ):
  def dict(f):
    reduce .[] as $o ({}; .[$o | f | tostring] = $o ) ;
  dict(f) as $ids
  | . + [ details[] | select( f in($ids)) ] 
  | group_by(f)
  | map(add);

Additional: itemsjson:

[
  {"id": "A", "name": "Ant"},
  {"id": "B", "name": "Butterfly"},
  {"id": "C", "name": "Caterpillar"}
]

item_detailsjson:

[
  {"id": "A", "speciesCount": 12000},
  {"id": "C", "speciesCount": 20000},
  {"id": "D", "speciesCount": 400}
]
$ jq --version
jq-1.5-1-a5b5cbe
pkoppstein commented 6 years ago

Your jq filter mentions itemsjson and item_detailsjson but you don't seem to have told jq how to associate these with the corresponding JSON. There are many valid approaches, but to keep things simple, I'm going to suggest you start by skipping the complexities of "include", and start with program.jq as follows:

# input: an array of items to be elaborated
# f: a key into these items
# details: additional details, also keyed by f
# output: an array of the elaborated input items
def elaborate( f; details ):
  def dict(f):
    reduce .[] as $o ({}; .[$o | f | tostring] = $o ) ;
  (details|dict(f)) as $dict
  | map(. + $dict[f]);

def item_details:
[
  {"id": "A", "speciesCount": 12000},
  {"id": "C", "speciesCount": 20000},
  {"id": "D", "speciesCount": 400}
]
;

elaborate(.id; item_details)

Then assuming the file items.json contains the JSON you have under the heading "itemsjson", the following would be the appropriate invocation:

jq -f program.jq items.json
DanielChristow commented 6 years ago

That worked! Thank you!!! This just clicked for me.

Would there be a way to modify the elaborate filter/modifier to accept two JSON inputs? Or more? This way one would not have to be embedded in the program.jq file itself.

My thought would be something like: elaborate(key, add_to_this.json, strip_key_value_pairs_from_this.json)

My data is a bit nested too, will this filter work for that? Here's a blurb of data:

{
    "Schedule": [
        {
            "deviceId": 123,
            "reservationId": 123456,
            "username": "jdoe"
        },
        {
            "deviceId": 456,
            "reservationId": 589114,
            "username": "jsmith"
        }
    ],
    "serverTime": 1522863125.019958
}
[
    {
        "a": {
            "b": "10.0.0.1",
            "c": "hostname1"
        },
        "deviceId": 123
    },
    {
        "a": {
            "b": "10.0.0.2",
            "c": "hostname2"
        },
        "deviceId": 456
    }
]

With the goal being to end up with:

{
    "Schedule": [
        {
            "deviceId": 123,
            "reservationId": 123456,
            "username": "jdoe",
            "a": {
                "b": "10.0.0.1",
                "c": "hostname1"
            }
        }
        },
        {
            "deviceId": 456,
            "reservationId": 789101,
            "username": "jsmith",
            "a": {
                "b": "10.0.0.2",
                "c": "hostname2"
            }
        }
    ],
    "serverTime": 1522863125.019958
}

If I haven't been clear enough, I am extremely grateful for your assistance. Thank you so much for your help.

pkoppstein commented 6 years ago

Here's one way to avoid having any data in your program file. Let's also use jq's support for file inclusion this time.

The invocation:

jq  -f program.jq --argfile item_details item_details.json items.json

program.jq

include "elaborate";
elaborate(.id; $item_details )

Now that everything is nicely modularized, you should be able to tweak the program to do what you really want.

If you need further help, then if you haven't already done so, I'd suggest reading the online tutorial, and familiarizing yourself with the jq manual and FAQ.

In future, please ask usage questions at stackoverflow.com using the jq tag: tackoverflow.com/questions/tagged/jq That way, you're likely to get the perfect answer or at least a nice selection, and others can benefit as well.