jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.53k stars 1.58k forks source link

Array element is missing with more array elements but not with fewer?! #2349

Closed Fribb closed 3 years ago

Fribb commented 3 years ago

I have a weird behaviour that I can't wrap my head around because it behaves differently depending on the size of the JSON arrays.

I have two JSON Arrays each in their own file and my goal is to merge them both into an "output" file so that the array elements are merged when it contains a specific key/value pair.

How that looks like, here are the two files I am talking about:

  1. anime-list-reduced
  2. anime-offline-database-reduced

and the JQ command I use currently is jq -s 'flatten | group_by(.anidb_id) | map(reduce .[] as $x ({}; . * $x))' anime-offline-database-reduced.json anime-lists-reduced.json > anime-list-full.json

Unfortunately, some elements, for example, "mal_id": 34777 or "mal_id": 9062 are missing from the output file anime-list-full.json

However, when I reduce the array to a few elements to try the command out for a more manageable pack of data, the behaviour is different. For example, I shorten the anime-offline-database-reduced.json to

[
    {
        "livechart_id": 2099,
        "anime-planet_id": "91-days",
        "anisearch_id": 11240,
        "anidb_id": 12014,
        "kitsu_id": 11957,
        "mal_id": 32998,
        "type": "TV",
        "notify.moe_id": "RjtepKmig",
        "anilist_id": 21711
    },
    {
        "anime-planet_id": "91-days-special",
        "kitsu_id": 13598,
        "mal_id": 34777,
        "type": "SPECIAL",
        "notify.moe_id": "NThVhKmmR",
        "anilist_id": 98778
    }
]

and the anime-lists-reduced.json to

[
    {
        "thetvdb_id": 309530,
        "themoviedb_id": 67043,
        "anidb_id": 12014
    }
]

When I run the command above I get exactly what I want, 2 elements with both having the key/value pair with "mal_id" while only one has a key/value pair with "anidb_id" as it should be.

[
  {
    "anime-planet_id": "91-days-special",
    "kitsu_id": 13598,
    "mal_id": 34777,
    "type": "SPECIAL",
    "notify.moe_id": "NThVhKmmR",
    "anilist_id": 98778
  },
  {
    "livechart_id": 2099,
    "anime-planet_id": "91-days",
    "anisearch_id": 11240,
    "anidb_id": 12014,
    "kitsu_id": 11957,
    "mal_id": 32998,
    "type": "TV",
    "notify.moe_id": "RjtepKmig",
    "anilist_id": 21711,
    "thetvdb_id": 309530,
    "themoviedb_id": 67043
  }
]

This doesn't make any sense to me on why the JQ command seems to ignore or discard those elements with a lot of array elements while it doesn't with a couple of array elements.

itchyny commented 3 years ago

All the objects without "anidb_id" field are merged into one object. On reducing by the * operator same fields are overwritten by the right hand side object.

Fribb commented 3 years ago

Okay, that makes sense. Do you have a tip on how this should be corrected so that the elements without "anidb_id" are still available individually in the output?

emanuele6 commented 3 years ago

@Fribb: would something like this work?

jq -s 'flatten | group_by(.anidb_id) | map(if .[0] | has("anidb_id") then reduce .[] as $x ({}; . * $x) else .[] end)' anime-lists-reduced.json anime-offline-database-reduced.json > anime-list-full.json
Fribb commented 3 years ago

@Fribb: would something like this work?

jq -s 'flatten | group_by(.anidb_id) | map(if .[0] | has("anidb_id") then reduce .[] as $x ({}; . * $x) else .[]  end)' anime-lists-reduced.json anime-offline-database-reduced.json > anime-list-full.json

I will try this tomorrow and will give some feedback on the result.

Fribb commented 3 years ago

@Fribb: would something like this work?

jq -s 'flatten | group_by(.anidb_id) | map(if .[0] | has("anidb_id") then reduce .[] as $x ({}; . * $x) else .[] end)' anime-lists-reduced.json anime-offline-database-reduced.json > anime-list-full.json

this seems to have worked, thank you very much.