mahmoud / glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
https://glom.readthedocs.io
Other
1.89k stars 61 forks source link

Getting a non-manipulated subset, including or omitting keys #208

Open tuukkamustonen opened 3 years ago

tuukkamustonen commented 3 years ago

It would appear that glom excels at grabbing (and manipulating) specific data out of data structures.

1) However, the syntax for simply extracting a subset feels a bit verbose:

   data = {'key1': {'foo': 1, 'bar': 2}, 'key2': {'baz': 3, 'daz': 4}}
   glom(data, {'key1': {'foo': 'key1.foo'}, 'key2'})

I don't need to manipulate anything here, just extract a sub-dict. Instead of what's above, I'd like to write something like:

   glom.collect(data, ('key1.foo', 'key2'))

Is there less verbose syntax, like above, for this (grab subset without mutation)?

2) Sometimes, the need is in reverse - to grab a copy of data structure, omitting certain (potentially nested) keys. I can do that at least by:

   new = copy.deepcopy(original)
   glom(new, (Delete('key1.foo', ignore_missing=True), Delete('key2.daz', ignore_missing=True)))

   # To make it a bit less verbose with many keys:
   glom(new, tuple(Delete(key, ignore_missing=True) for key in ('key1.foo', 'key2.daz')))

But that's again too verbose for this simple task. Plus, there's need to copy the data first, as glom manipulates it in-place.

What I would like write instead:

   glom.omit(data, ('key1.foo', 'key2.daz'))

Is there less verbose syntax for this (grab subset, omitting certain keys)?

(Newcomer to glom... seems such a beast!)

kurtbrose commented 3 years ago

That's an interesting idea!

One of the key features of glom is it is very easy to extend; so here's my first thought:

class SubDict:
    def __init__(self, include=None, exclude=None):
        self.include, self.exclude = include, exclude
    def glomit(self, target, scope):
        if self.include:
            result = {key: target[key] for key in self.include}
        else:
            result = dict(target)
        if self.exclude:
            for key in self.exclude:
                del result[key]
        return result
kurtbrose commented 3 years ago

hmmm.. that's not quite right; because you want to be able to use 'a.b' to refer to nested structures, but its a starting point

kurtbrose commented 3 years ago

it's in a similar vein to this:

https://github.com/mahmoud/boltons/issues/266

maybe that suggests an API?

tuukkamustonen commented 3 years ago

Yes, it should work on nested data structures.

I think the API should resemble current glom-API, so maybe it should be declarable via spec (instead of glom.collect() or glom.omit() like the examples above).

I think this should be mentioned early in the tutorial, right after the nested get syntax glom(target, ['some.nested.key']). I personally expected something this simple to be easily done with such a powerful library.

I don't think there's need for defaults - wouldn't hurt, but those could be added in a subsequent call if needed. If a key doesn't exist, it's not included.

It should consider include/exclude while building, instead of first copying and then deleting attributes. Because, it's more performance effective, and the-right-way(tm).