mahmoud / glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
https://glom.readthedocs.io
Other
1.88k stars 61 forks source link

Scope usage changes from v22 to v23 #259

Open DamianBarabonkovQC opened 1 year ago

DamianBarabonkovQC commented 1 year ago

In glom v22, I was able to assign a scope with a spec and that automatically got applied to the target. Version 23 breaks that.

For example, this code used to work in v22:

target = {'data': {'val': 9}}
spec = (S(value = ('data', 'val')), S.value)
glom(target, spec) # returns 9

However, in v23

target = {'data': {'val': 9}}
spec = (S(value = ('data', 'val')), S.value)
glom(target, spec) # returns ('data', 'val')

The previous behavior of v22 is infinitely more helpful than this new usage. Unless this was an intentional breaking change, I am filing this issue to recommend that we revert to the old behavior.

mahmoud commented 1 year ago

Ah yeah this looks related to the arg-mode update (#196). You can get the old behavior with S(value=Path('data', 'val'))

Previous to arg mode, the default interpretation of S kwargs as literals vs specs was somewhat arbitrary/unspecified. I believe now that it's specified, we'd prefer to keep it this way, and because Path is a forward-compatible approach, hopefully that approach can work for this case. Your feedback on S usage is valuable, so give it a try and let us know :)

kurtbrose commented 1 year ago

Could you give more context on what you use that for?

You can get back the old behavior via explicit mode (Auto) or disambiguate by using T --

>>> import glom
>>> from glom import S, glom, Auto
>>> target = {'data': {'val': 9}}
>>> glom(target, (S(value = ('data', 'val')), S.value))
('data', 'val')
>>> glom(target,  (S(value = Auto(('data', 'val'))), S.value))
9
>>>  glom(target,  (S(value = T['data']['val']), S.value))
9

The assumption is the more common case is building an intermediate data structure. Also, T()-args, Invoke()-args, Assign(), and several default-args all have consistent behavior now whereas before some would treat their argument as a constant, and others would treat it as a spec; often there were special case cut-outs for T (and only T).

I'd be curious what your use case is where you are finding S handy. Usually if I have an intermediate step, it's just as easy to build a dict to hold temporary values "in parallel".

(
  {
     'val': ('data', 'val'),
     'other_val': ...
  },
  'val'
)

There's probably a gap in documentation that this came out of nowhere for you -- hopefully it wasn't too painful to figure out.

DamianBarabonkovQC commented 1 year ago

I am unclear what you mean by build a dictionary to hold temporary values in parallel.

My more specific use case is something like in (v22 style):

target = ...
path = (
    S(id=("id",)),
    S(contract_type=("contract", "type"),
    ("receipts", [
        {
            "id": S.id,
            "contract_type": S.contract_type,
            "amount": ("amount",),
            "from": ("sender", "id"),
        }
    ])
)

Hope this clears up my exact usage more or less.

mahmoud commented 1 year ago

Got it. A great use of scope if I've ever seen one.

To extrapolate your point, @DamianBarabonkovQC, and let me know if I'm wrong, but are you saying that when you use S, you typically expect to assign a dynamic value via spec (e.g., a tuple representing a path), as opposed to a literal (e.g., an actual tuple of strings)?

If so, I can see that point. If one were to want a literal during execution, there are several ways to get one, without using S. What do you think, @kurtbrose?

(btw, @DamianBarabonkovQC, what do you say to using the Path approach I described?)

DamianBarabonkovQC commented 1 year ago

So I gave you an overly simplified example. Actually, my scope path is wrapped in a Coalese to handle any missing values with a MISSING. For example S(id=Coalesce("id", default=MISSING)). So I got this working by using the Spec keyword instead of the Path.

kurtbrose commented 1 year ago
path = (
    S(id=("id",)),
    S(contract_type=("contract", "type"),
    ("receipts", [
        {
            "id": S.id,
            "contract_type": S.contract_type,
            "amount": ("amount",),
            "from": ("sender", "id"),
        }
    ])
)

Oh, I see the challenge, you want to apply the outer values across each element of receipts. There's a few ways of doing that but none of them are cleaner than your current approach. I'd recommend switching to T in that case.

path = (
    S(id=T["id"], contract_type=T["contract"]["type"]),
    "receipts",
    [{
         "id": S.id,
         "contract_type": S.contract_type,
         "amount": "amount",
         "from": "sender.id",
    }]
)

I agree with your approach -- this is the type of case S is good for, where you need to cache an intermediate value that you'd otherwise lose track of.

Some other simplifications we can make:

Those might be necessary in the original spec and have become redundant when you simplified it down here

boonhapus commented 10 months ago

In glom v22, I was able to assign a scope with a spec and that automatically got applied to the target. Version 23 breaks that.

Coming here to +1 the behavior change. Revisiting an old project in a new venv and spent 30mins debugging what the heck went wrong. 😅

I'd recommend switching to T in that case. ... this is the type of case S is good for, where you need to cache an intermediate value that you'd otherwise lose track of.

I swapped many implicit references to use T instead and things were great again.