mahmoud / glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
https://glom.readthedocs.io
Other
1.88k stars 61 forks source link

Question, how to pull items into an array in a spec? #260

Open adlmtl opened 1 year ago

adlmtl commented 1 year ago

Suppose I have a target: { 'a': 1, 'b': 2, 'c': [{ 'd': 3, 'e': 4 }] }

how would I write a spec to produce the output: { 'my_output': { 'a_and_b': [1, 2], 'c': [{ 'new_d': 3, 'new_e': 4 }] } }

the part I am hung up on, is how to take 'a' and 'b' and get them into an array...

spec = { 'my_output': { 'a_and_b': ['a', 'b'], 'c': ('c', [{'new_d': 'd', 'new_e': 'e'}]) } }

kurtbrose commented 1 year ago

you want the Fill spec:

>>> from glom import glom, Fill, T
>>> target = {'a': 1, 'b': 2}
>>> glom(target, Fill([T['a'], T['b']]))
[1, 2]

Fill is for "data structure as a template" use cases like this.

in the full context:

>>> target = { 'a': 1, 'b': 2, 'c': [{ 'd': 3, 'e': 4 }] } 
>>> spec = {'my_output': {'a_and_b': Fill([T['a'], T['b']]), 
...                       'c': ('c', [{'new_d': 'd', 'new_e': 'e'}]) } }
>>> glom(target, spec)
{'my_output': {'a_and_b': [1, 2], 'c': [{'new_d': 3, 'new_e': 4}]}}
MagnusOxlund commented 1 month ago

Fill is for "data structure as a template" use cases like this.

Thanks for pointing that out. I'm just learning glom, and based on the tutorial and the API, I assumed the structure of the spec could be used to declare the structure of the output:


Our astronomers want to focus in on the Solar system, and represent planets as a list. Let’s restructure the data to make a list of names

spec - (aka “glomspec”, short for specification) The accompanying template used to specify the structure of the return value.

A few failed attempts later, it's not clear to me that's possible with just glom.glom, but maybe I'm making some wrong assumptions about how glom.glom's spec parameter works.

At which point does Fill become necessary when restructuring?

Also, I assume Fill is a reference to glom's Fill mode. There's not a lot written about it in the docs. Maybe an example like the one above could be used to illustrate how Fill compares to Auto?

Fill - A variant of the default transformation behavior; preferring to “fill” containers instead of iterating, chaining, etc.


Here's my own restructuring scenario with the goal of using target data as dictionary keys in the output. I'm still not sure how to get there without resorting to conventional solutions:

# Desired output:
# {
#   123: {
#       'current_price': 100,
#       'previous_price': 120
#   },
#   246 : {
#       'current_price': 200,
#       'previous_price': 240
#   },
#   492 : {
#       'current_price': 400,
#       'previous_price': 480
#   }
# }

target = {
    'order_no' : 1192929,
    'line_items' : [
        {
            'SKU': 123,
            'price': {
                'current_price': 100,
                'previous_price': 120
            }
        },
        {
            'SKU': 246,
            'price': {
                'current_price': 200,
                'previous_price': 240
            }
        },
        {
            'SKU': 492,
            'price': {
                'current_price': 400,
                'previous_price': 480
            },
        }
    ]
}

from glom import glom, Spec

spec = {
    Spec('line_items.*.SKU') : Spec('line_items.*.price')
}

print(glom(target, spec))

# glom.core.GlomError.wrap(TypeError): error raised while processing, details below.
#  Target-spec trace (most recent last):
#  - Target: {'line_items': [{'SKU': 123, 'price': {'current_price': 100, 'previous_price': 120}}, ... (len=2)
#  - Spec: {Spec('line_items.*.SKU'): Spec('line_items.*.price')}
# TypeError: unhashable type: 'list'
mahmoud commented 1 month ago

Ah, if only * and Fill were so powerful. Really, Fill is quite similar to Auto, except that built-in [] and () don't have the iterate and chain behaviors by default. It's useful to fill containers more explicitly.

As for your desired transform, it's definitely possible in glom, but not so simple, unfortunately.

The shortest approach is to use Invoke and zip (interactive link):

(Invoke(zip).specs("line_items.*.SKU", "line_items.*.price"), dict)

Here's another, more-verbose approach that'll give you a better error if an object is missing the SKU key, based on the docs (interactive link here).

    (
        "line_items",
        (
            [
                (
                    A.globals.item,
                    {
                        S.globals.item["SKU"]: {
                            "current_price": "price.current_price",
                            "previous_price": "price.previous_price",
                        }
                    },
                )
            ],
        ),
    )

That's based on this example from the docs. Hope this helps!

MagnusOxlund commented 1 month ago

Thank you for the demonstration. I don't think I would've arrived at those solutions on my own, so I appreciate that you took the time show how to approach the problem with glom. Thanks!