mahmoud / glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
https://glom.readthedocs.io
Other
1.89k stars 61 forks source link

Use glom to extract subset of nested dict #210

Open dadokkio opened 3 years ago

dadokkio commented 3 years ago

Hello, I'm using glom to permit user to keep only selected keys in a nested dict trying to preserve original dict schema.

The code is something like this:

from glom import glom

def filter_dict(input_dict, fields):

    def path_to_dict(path, value=None):
        def pack(parts):
            return (
                {parts[0]: pack(parts[1:]) if len(parts) > 1 else value}
                if len(parts) > 1
                else {parts[0]: value}
            )
        return pack(path.split("."))

    def merge(d1, d2):
        for k in d2:
            if k in d1 and isinstance(d1[k], dict) and isinstance(d2[k], dict):
                merge(d1[k], d2[k])
            else:
                d1[k] = d2[k]

    output_dict = {}
    for field in fields.split(","):
        field = field.strip()
        merge(
            output_dict,
            path_to_dict(
                field,
                glom(input_dict, field, default=None),
            ),
        )
    return output_dict

And with a:

a = {
    "a": "val_a",
    "b": {
        "ba": "val_ba", 
        "bb": [
           {'bba': 'val_bba_1', 'bbb': 'val_bbb_1'}, 
           {'bba': "val_bba_2", 'bbb': 'val_bbb_2'}
        ]
    },
    "c": {"cc": {"ccc": "val_ccc"}},
}

outputs are:

>> filter_dict(a, "a, b.ba")
    {'a': 'val_a', 'b': {'ba': 'val_ba'}
>> filter_dict(a, "a, b.bb.0.bba, c.cc.ccc")
    {'a': 'val_a', 'b': {'bb': {'0': {'bba': 'val_bba_1'}}}, 'c': {'cc': {'ccc': 'val_ccc'}}}
>> filter_dict(a, "a, b.bb.*.bba")   # WITH PULL #144
    {'a': 'val_a', 'b': {'bb': {'*': {'bba': ['val_bba_1', 'val_bba_2']}}}}

First question, is this approach ok or there is a better approach to have same results? Second, there is a way to have list back as list and not as dict (eg '0' or '*' in examples)?