mahmoud / glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
https://glom.readthedocs.io
Other
1.88k stars 61 forks source link

Extract nodes from json based on user input preserveing a portion of the higher level object as well #247

Open vineetsingh065 opened 1 year ago

vineetsingh065 commented 1 year ago

I need to extract object from the given json based on the node chain passed by user and neglect those which are not in user input, then create a new json object

my master json is :

{
        "menustructure": 
        [
                {
                 "node":"Admin",
                 "path":"admin",
                    "child":[
                            {
                                "node": "Admin.resouce1",
                                "path":"resouce1",
                                "rank":1
                             },

                            {"node":"Admin.resouce2",
                                "path": "oath",
                                "rank":2
                            }
                           ]
                },
                {
                    "node":"Workspace",
                    "path": "wsp",
                    "child":[{
                            "node": "Workspace.system1",
                            "path":"sys1"

                        },
                        {
                            "node": "Workspace.system2",
                            "path":"sys2"
                        }
                    ]
                }

        ]
    }

for example if user pass ['Admin.resource1', 'Workspace'] so expeceted ouput json will be Note '.' in element of user inputted list means that node have child nodes and new json will be having all those child node details including parent node details.

 {
        "menustructure": 
        [
                {
                 "node":"Admin",
                 "path":"admin",
                    "child":[
                            {
                                "node": "Admin.resouce1",
                                "path":"resouce1",
                                "rank":1
                             }
                           ]
                },
                {
                    "node":"Workspace",
                    "path": "wsp",
                    "child":[{
                            "node": "Workspace.system1",
                            "path":"sys1"

                        },
                        {
                            "node": "Workspace.system2",
                            "path":"sys2"
                        }
                    ]
                }

        ]
    }

or another example is : ['Admin.resouce2', 'workspace.system1'] then expected json will be:

   {
        "menustructure": 
        [
                {
                 "node":"Admin",
                 "path":"admin",
                    "child":[

                            {"node":"Admin.resouce2",
                                "path": "oath",
                                "rank":2
                            }
                           ]
                },
                {
                    "node":"Workspace",
                    "path": "wsp",
                    "child":[{
                            "node": "Workspace.system1",
                            "path":"sys1"

                        }
                    ]
                }
        ]
    }

or if only single node passed ['Admin'] then output json will be:

{
        "menustructure": 
        [
                {
                 "node":"Admin",
                 "path":"admin",
                    "child":[
                            {
                                "node": "Admin.resouce1",
                                "path":"resouce1",
                                "rank":1
                             },

                            {"node":"Admin.resouce2",
                                "path": "oath",
                                "rank":2
                            }
                           ]
                }   
        ]
    }

or another example is : ['Admin.resouce1', 'Admin.resouce2'] then expected json will be:

 {
        "menustructure": 
        [
                {
                 "node":"Admin",
                 "path":"admin",
                    "child":[

                            {"node":"Admin.resouce1",
                                "path": "oath",
                                "rank":1
                            },
                                                {"node":"Admin.resouce2",
                                "path": "oath",
                                "rank":2
                            }
                           ]
                },
                {
                    "node":"Workspace",
                    "path": "wsp",
                    "child":[{
                            "node": "Workspace.system1",
                            "path":"sys1"

                        }
                    ]
                }
        ]
    }

How would I achieve that using Glom?

kurtbrose commented 1 year ago

First off, it's a recursive problem so we'd need to use Ref in order to get that problem.

Then, it's a filtration problem, so we will need to use the SKIP marker object to drop things which don't match.

Additionally, there are two inputs here: one is the nested nodes, the other is the list of attribute nodes.

This might be better solved with a direct recursion or the remap() recursion helper from boltons. I don't want to over-promise that glom is the right solution. But, I'll take a shot at it.

Reformulating the problem:


node_spec = Ref("node-spec", 
   Or(
      # case 1: this node is in the input; return it and all children
      And(lambda t: t["node"] in t[S]["input-nodes"], T),
      # case 2: one of the children is in the input
      And((
         A.node,
         "child",
         [Ref("node-spec")],
         Merge(S.node, 

Oof, even as I'm writing this I can tell it's a bad fit for glom; there are multiple inputs and a lot of internal state. This is really just a recursion problem. I'm not going to bother trying to finish it, it would be really tortured.

def filter_node(cur, to_include):
   if cur["node"] in to_include:
      return cur
   children = []
   for child in cur["child"]:
      filtered = filter_node(child)
      if filtered:
         children.append(filtered)
    if children:
      return {**cur, "child": children}
   return None