How is Fill supposed to operate?

mahmoud / glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️

https://glom.readthedocs.io

Other

1.89k stars 61 forks source link

How is Fill supposed to operate? #218

Open mathrick opened 3 years ago

mathrick commented 3 years ago

This is related to #207. Basically, I don't think Fill does what I expected it to do after reading the docstring, and I also don't think it's the most useful mode in its current incarnation. Specifically, the mode it introduces is too literal. In the example in #207, I'd expect the following to be the case:

      target = [{'a': 1, 'b': 1}, {'a': 2, 'b': 2}]
      glom(target, [Fill(('a', 'b'))]) == [(1, 1), (2, 2)]
      >>> False

In reality though, I have to mark 'a' and 'b' with Path explicitly, which is not at all obvious from the description of Fill: "A variant of the default transformation behavior; preferring to “fill” containers instead of iterating, chaining, etc.". There are no containers here, so I'd expect 'a' and 'b' to be interpreted as paths still, whilst "structural" specs get intepreted as templates for containers to create. But it seems that what it really means is "everything except Path and Spec becomes a literal". This makes the resulting spec so verbose and hard to read for what it does, I might as well not bother with Fill:

      target = [{'a': 1, 'b': 1}, {'a': 2, 'b': 2}]
      glom(target, [Fill((Path('a'), Path('b')))]) == [(1, 1), (2, 2)]
      >>> True

mahmoud commented 3 years ago

Ah, Fill. The simultaneously more minimalist and more advanced templating mode.

Fill's intended solution to clean that up is to use T:

    target = [{'a': 1, 'b': 1}, {'a': 2, 'b': 2}]
    glom(target, [Fill((T['a'], T['b']))]) == [(1, 1), (2, 2)]
    >>> True

Fill is meant to be more precise, so it pairs better with T. You only need to use Path if you want to retain the automatic transparent access to both object attributes, dict keys, and list indexes.

So, for context, glom started out as a pretty opinionated library, before growing the modes layer that allows different opinions to coexist, e.g., Fill vs Auto. So I think one design goal of modes is that it should be straightforward to derive a Fill mode that interprets strings as paths, as you've suggested, above.

There's a short doc on the topic. Quickest way I can think to do it would be to inherit (or compose a new object based on) Fill and override _fill to have the string behavior you desire, and call through to super() for the rest. What do you think?

mathrick commented 3 years ago

There's a short doc on the topic. Quickest way I can think to do it would be to inherit (or compose a new object based on) Fill and override _fill to have the string behavior you desire, and call through to super() for the rest. What do you think?

Right, the doc and the changelog is where I read about Fill at first, and my understanding of "prefers to fill containers instead of iterating / chaining" did not also include changing how strings are interpreted :). I do think it'd be possible to implement a version that works more like what I expected, but I also wanted to discuss whether Fill's current behaviour is truly the best default.

Whilst T['a'] is a solution (and I did actually mention it in #207), I think treating bare strings as paths, just as in Auto, would be more useful. The primary need that isn't cleanly addressed in Auto is "how do I create output values of container types that glom interprets as chained specs?". That's what I expected Fill to do. But the question of "how do I create a literal string output value, instead of having it interpreted as a path?" is already answered by Val(), so there's no need to have Fill do that, especially when it arguably reduces the expressiveness of the container-value creation.

Of course, since Fill has been released in its current form, its behaviour will need to stay as is for backwards compatibility. But I think it'd be useful to have another stock mode that is more selective in what it changes. It could either go back to being called Template, or maybe something like Struct. Would you like me to prepare a PR?

kurtbrose commented 3 years ago

To give some more context on Fill -- it was built around the use case of filling in configuration data structures here's the (slightly sanitized) prototypical example

Fill({
    'debug': T['debug'],
    'xmlsec_binary': saml2.sigver.get_xmlsec_binary([
        '/opt/local/bin', '/usr/bin/xmlsec1', '/app/vendor/xmlsec1/bin/']),
    'entityid': _Format('{base_url}/metadata/'),
    'description': 'IdP',
    'service': {
        'idp': {
            'name': 'IdP',
            'endpoints': {
                'single_sign_on_service': [
                    (_Format('{base_url}/sso/post/'), saml2.BINDING_HTTP_POST),
                    (_Format('{base_url}/sso/redirect/'),
                     saml2.BINDING_HTTP_REDIRECT),
                ],
                'single_logout_service': [
                    (_Format('{base_url}/slo/post/'), saml2.BINDING_HTTP_POST),
                    (_Format('{base_url}/sso/redirect/'),
                     saml2.BINDING_HTTP_REDIRECT)
                ]
            },
            'name_id_format': [
                saml2.saml.NAMEID_FORMAT_EMAILADDRESS,
                saml2.saml.NAMEID_FORMAT_UNSPECIFIED],
            'sign_response': False,
            'sign_assertion': True,
            "signing_algorithm": SIG_RSA_SHA256,
            "digest_algorithm": SIG_RSA_SHA256,
        },
    },
    'metadata': {
        'remote': lambda t: [  # not safe to do DB @ import so defer till format
            {'url': entity} for entity in db_results()]
    },
    # Signing
    'key_file': _Format('{cert_dir}/private.key'),
    'cert_file': _Format('{cert_dir}/public.cert'),
    # Encryption
    'encryption_keypairs': [{
        'key_file': _Format('{cert_dir}/private.key'),
        'cert_file': _Format('{cert_dir}/public.cert'),
    }],
    'valid_for': 365 * 24,
})

kurtbrose commented 3 years ago

in addition to strings, other non-data-structure objects like int and bool which would be an error in Auto mode will pass through to the result

just to see how it looks, here's the example with Val wrapped around all the plain values

Fill({
    'debug': T['debug'],
    # xml_sec needs a buildpack on HEROKU to work
    'xmlsec_binary': Val(saml2.sigver.get_xmlsec_binary([
        '/opt/local/bin', '/usr/bin/xmlsec1', '/app/vendor/xmlsec1/bin/'])),
    'entityid': _Format('{base_url}/metadata/'),
    'description': Val('EBill/Tableau IdP'),

    'service': {
        'idp': {
            'name': Val('Ebill IdP'),
            'endpoints': {
                'single_sign_on_service': [
                    (_Format('{base_url}/sso/post/'), Val(saml2.BINDING_HTTP_POST)),
                    (_Format('{base_url}/sso/redirect/'),
                     Val(saml2.BINDING_HTTP_REDIRECT)),
                ],
                'single_logout_service': [
                    (_Format('{base_url}/slo/post/'), Val(saml2.BINDING_HTTP_POST)),
                    (_Format('{base_url}/sso/redirect/'),
                     Val(saml2.BINDING_HTTP_REDIRECT))
                ]
            },
            'name_id_format': [
                Val(saml2.saml.NAMEID_FORMAT_EMAILADDRESS),
                Val(saml2.saml.NAMEID_FORMAT_UNSPECIFIED)],
            'sign_response': Val(False),
            'sign_assertion': Val(True),
            "signing_algorithm": Val(SIG_RSA_SHA256),
            "digest_algorithm": Val(SIG_RSA_SHA256),
        },
    },
    'metadata': {
        'remote': lambda t: [  # not safe to do DB @ import so defer till format
            {'url':entity} for entity in db_results()]
    },
    # Signing
    'key_file': _Format('{cert_dir}/private.key'),
    'cert_file': _Format('{cert_dir}/public.cert'),
    # Encryption
    'encryption_keypairs': [{
        'key_file': _Format('{cert_dir}/private.key'),
        'cert_file': _Format('{cert_dir}/public.cert'),
    }],
    'valid_for': Val(365 * 24),
})

this is totally a matter of subjective perception, but to me when I flip the screen back and forth, that extra level of ( ... ) around long-ish values like saml2.saml.NAMEID_FORMAT_EMAILADDRESS really slows down comprehending the structure

mathrick commented 3 years ago

I see. Looking at it in a long structure that's heavily nested and mostly literal, I can see why it was done this way.

kurtbrose commented 3 years ago

One thing that occurs is maybe a way to make this behavior transparent and overridable.

Maybe we could have a "paint-by-number" Mode where you simply override the types that you want to. Fill could be a good candidate for the basis of that template -- "for low level built in python data structures, recurse; for others pass through".

I'm imagining an API something like:

Fill(overrides={str: Path}, ...)

There's this idea hanging out that, EVENTUALLY, we will need to switch to "compile" glom-specs. The compilation (glompilation heh) would convert all the specs to more concrete primitives. This would improve performance by getting rid of a layer of dispatch logic, and eventually we could push the primitive callbacks down into Cython. Once we get there, the idea would be for glom to dogfood / spiral itself up by using gloms to transform specs.

In that general direction, you could imagine a transformer (glom-macro or glomacro if you will):

def str2path(spec)
    return glom(spec,
        Ref('spec',
              Match(Switch({
                      dict: {T: Ref('spec')},
                      list: [Ref('spec')],
                      str: Invoke(Path).specs(T)}))))

some challenges with this approach as-is:

pain in the butt to handle tuple, set, frozenset
ideally there would be a way to express this within the glom

kurtbrose commented 3 years ago

Maybe there should be a "mirror image" of Val() -- whereas Val says "treat this as a plain value, do not glom" this other mechanism would say "run this through an extra layer of glomming"

class Macro:
    def __init__(self, macro_spec, spec):
        self.macro_spec, self.spec = macro_spec, spec
    def glomit(self, target, scope):
        spec = scope[glom](self.macro_spec, self.spec, scope)
        return scope[glom](spec, target, scope)

Hmm.... it gets super ugly when I try to use it thought

Fill(Macro(Ref('macro', Match(Switch({str: Invoice(Path).specs(T), tuple: [Ref('macro')])))

expanding out the tuple and making it work properly is way too complicated -- tuple-to-list, apply macro, list-to-tuple