jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.58k stars 1.58k forks source link

Merging three JSON files with two different keys #2445

Closed gvp-devops closed 1 year ago

gvp-devops commented 2 years ago

Have been perplexed to getting my head around merging three JSON files by two different keys in order to provide one complete JSON file without removing duplicate keys. My thought process is this needs to be done by two different jq commands.

  1. The initial run processes successfully because the ID is unique.
    jq -s 'flatten | group_by(.id) | map(reduce .[] as $x ({}; . * $x))' File1 File2 > /tmp/File_temp.json

    However when I run the same command but change id to project_id, that almost works but what happens is when there are multiple tenants owners that own multiple ID instances, the final result remove all ID and project_id key duplicates and only provides one result per tenant owner. I know it has to do with the map(reduce... filter. The end result is not to remove any duplicate values since a single tenant can own multiple instance IDs.

Have scrubbed multiple JQ docs and StackOverflow but no reference to merge three JSON files, and what options are preferable to use map filter.

  1. Second run to merge tenant owner names into complete master JSON file3.
    jq -s 'flatten | group_by(.project_id) | map(reduce .[] as $x ({}; . * $x))' File_temp File3 > /tmp/master_inventory.json

NOTE: In reference to JSON format, this is the JSON output format when using Openstack CLI

Instance File 1

{
  "ID"   "74a7cb34-1510-4cb6-aaa5-3e97ef13d89e", <<<<<<<<< ID Key to reference File2 ID 
  "Name"     "bgf01_vnf_stack-0",
  "Status"   "ACTIVE",
  "Task State"   null,
  "Power State"  "Running",
  "Networks"     "bgf01_vnf_stack_internal=xxx.xxx.xxx.xxx; vbgf-v4-green-dynamic-1=xxx.xxx.xxx;
  "Image Name"   "vbgf01_28",
  "Image ID"     "54c9fc86-3331-4e45-972b-67fe3e0fba3c",
  "Flavor Name"  "d1.vbgf01_flavor",
  "Flavor ID"    "1aaed288-7867-4880-b4b2-7c0f8038ad86",
  "Availability Zone"    "nova",
  "Host"     "ospcomputed2-ctc-a610d",
  "Properties"   "ha-policy='ha-offline'"
}

Project ID File 2

  {
    "id": "74a7cb34-1510-4cb6-aaa5-3e97ef13d89e", <<<<<<<<< ID Key to reference Project_ID Key to File3
    "project_id": "116408287003490c87817885841753c3"
  }

Tenant Owner File 3

  {
    "project_id": "116408287003490c87817885841753c3", <<<<<<<< Project_ID Key reference to obtain tenant/instance Owner
    "tenant": "ims-vbgf-dev"
  }

Desired final merged JSON

{
  "ID"   "74a7cb34-1510-4cb6-aaa5-3e97ef13d89e",
  "Name"     "bgf01_vnf_stack-0",
  "Status"   "ACTIVE",
  "Task State"   null,
  "Power State"  "Running",
  "Networks"     "bgf01_vnf_stack_internal=xxx.xxx.xxx.xxx; vbgf-v4-green-dynamic-1=xxx.xxx.xxx.xxx
  "Image Name"   "vbgf01_28",
  "Image ID"     "54c9fc86-3331-4e45-972b-67fe3e0fba3c",
  "Flavor Name"  "d1.vbgf01_flavor",
  "Flavor ID"    "1aaed288-7867-4880-b4b2-7c0f8038ad86",
  "Availability Zone"    "nova",
  "Host"     "ospcomputed2-ctc-a610d",
  "Properties"   "ha-policy='ha-offline'"
  "project_id": "116408287003490c87817885841753c3", <<<<< Merged from File 3 into final JSON
   "tenant": "ims-vbgf-dev".                                                    <<<<< Merged from File 3 into final JSON
}
vintnes commented 2 years ago

Presuming you have multiple objects per file. Otherwise you could just smash everything together:

jq -n 'reduce inputs as $i ({}; . += $i)' File[123]

To cross-reference multiple objects across all your files, you might use an index object where the index is "renamed" (copied and deleted) during the second migration. This disadvantage here is memory footprint.

jq --null-input '
    reduce inputs as $i
    ( {}
    ; if $i.ID then                     # form 1
        .[$i.ID] = $i
    elif $i.id and $i.project_id then   # form 2
        .[$i.project_id] = .[$i.id] + ($i | {project_id})
        | del(.[$i.id])
    elif $i.tenant then                 # form 3
        .[$i.project_id] += ($i | {tenant})
    else . end
    )
    | .[]   # break down index object into stream of desired objects
' File[123]
gvp-devops commented 2 years ago

@vintnes Thank you for this and so much appreciated. I think I can see where you're going with these. In testing the first to just merge it all together I receive the following error (which has been common through out all jq runs). One thing I did not think to ask is the format of each JSON file. Each one is formatted as an array with brackets at the beginning and end of each file.

$ jq -n 'reduce inputs as $i ({}; . += $i)' ctc_osp_vmmaster_lc.json (file 1) ctc_osp_projectIDs.json (file 2) ctc_osp_nameid.json (file 3)
jq: error (at ctc_osp_vmmaster_lc.json:7202): object ({}) and array ([{"id":"9a1...) cannot be added

Test run after second migration

$ jq --null-input 'reduce inputs as $i ({}; if $i.id then .[$i.id] = $i elif $i.id and $i.project_id then .[$i.project_id] = .[$i.id] + ($i | {project_id}) | del(.[$i.id]) elif $i.tenant then .[$i.project_id]
+= ($i | {tenant}) else . end) | .[]' master1.json (temp file) ctc_osp_nameid.json (file 3)
jq: error (at master1.json:886): Cannot index array with string "id"

Second Migration Run

Result of cross-referencing using second jq code sample almost produced result. Rather it's merging two files together but not merging the tenant key with the actual identity record, as $ jq -s 'flatten | group_by(.project_id) | map(reduce .[] as $x ({}; . * $x))' File_temp File3 has done.

I believe what really need to know are the map options to not remove duplicate key values in order to maintain the inventory integrity.

(...)
{
  "id": "ffb801a4-93d9-4df7-82e6-5341f995e113",
  "name": "echoserver-workflow-test-b-4",
  "status": "active",
  "task state": null,
  "power state": "running",
  "networks": "green-dynamic=2001:558:1047:9d8:f816:3eff:fe0f:515b, xx.xxx.xxx.xxx",
  "image name": "centos-7-mvp_echoserver_v1.0.7_2022-04-21 03:08:35",
  "image id": "bb27e36a-3461-450f-af83-2bbe4531930d",
  "flavor name": "m1.small",
  "flavor id": "f874ca8b-c4fa-4dde-9fb6-3b22b3a2b0b2",
  "availability zone": "nova",
  "host": "ospcomputem-ctc-a311d",
  "properties": "group='echoservice-workflow-test'",
  "project_id": "21d74337771a43e2b0ac5a91db828898"
}
{
  "tenant": "wsv-gcs-qa"
}
(...)
vintnes commented 2 years ago
inputs[]
gvp-devops commented 2 years ago

@vintnes Spent some time to work on this and unable for jq to merge three JSON file arrays without having to use unique feature and remove all duplicate project_id records. As a result removes several inventory records. Each iteration of the suggested commands is peculiar when working with them as they produce two different results.

$ jq -n 'reduce inputs[] as $i ({}; . += $i)' ctc_osp_vmmaster_lc.json (file 1) ctc_osp_projectIDs.json (file 2) ctc_osp_nameid.json (file 3)
{
  "id": "74a7cb34-1510-4cb6-aaa5-3e97ef13d89e",
  "name": "bgf01_vnf_stack-0",
  "status": "active",
  "task state": null,
  "power state": "running",
  "networks": "bgf01_vnf_stack_internal=xxx.xxx.0.10,
  "image name": "vbgf01_28",
  "image id": "54c9fc86-3331-4e45-972b-67fe3e0fba3c",
  "flavor name": "d1.vbgf01_flavor",
  "flavor id": "1aaed288-7867-4880-b4b2-7c0f8038ad86",
  "availability zone": "nova",
  "host": "ospcomputed2-ctc-a610d",
  "properties": "ha-policy='ha-offline'",
  "project_id": "116408287003490c87817885841753c3",
  "tenant": "ims-vbgf-dev"
}
vintnes commented 2 years ago

Wow.

Your initial description of your source data was inadequate. The first solution (the simple reduce) was, as I explicitly stated, only for a situation with one object per file.

Your demo of the second solution (the conditional index object) is not the code I proposed. It looks like you tried to rewrite my solution by hand, but you lost some case sensitivity (.ID vs .id) and the del() statement.

gvp-devops commented 2 years ago

Wow.

Your initial description of your source data was inadequate. The first solution (the simple reduce) was, as I explicitly stated, only for a situation with one object per file.

Your demo of the second solution (the conditional index object) is not the code I proposed. It looks like you tried to rewrite my solution by hand, but you lost some case sensitivity (.ID vs .id) and the del() statement.

You're correct and apologize for the rewrite. I did use the code as originally provided however, there was one file I used where the keys were all lowercase thinking that may have been an issue based on one "comment" I was provided elsewhere.