mahmoud / glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
https://glom.readthedocs.io
Other
1.89k stars 61 forks source link

Skipping items which don't Validate #223

Closed boonhapus closed 2 years ago

boonhapus commented 3 years ago

Hi!

I'd like skip data elements in an API response that aren't relevant for my data pull. I've attempted to use Check(validate=fn, default=SKIP) to do this, and even attempted a Match(fn, default=SKIP) but could not get this to work. Any thoughts/redirection would be greatly helpful here!

input data structure

data = {
    'parse_player_events': [
        {
            'playbackData': {
                'csEvents': [
                    {'time': 36, 'positionX': 78, 'positionY': 161, 'attacker': 47, 'npcId': 10, 'byAbility': 5219, 'byItem': None},
                    {'time': 37, 'positionX': 78, 'positionY': 162, 'attacker': 47, 'npcId': 4, 'byAbility': None, 'byItem': None},
                    {'time': 39, 'positionX': 77, 'positionY': 161, 'attacker': 47, 'npcId': 10, 'byAbility': None, 'byItem': None},
                    {'time': 39, 'positionX': 77, 'positionY': 161, 'attacker': 47, 'npcId': 10, 'byAbility': 5219, 'byItem': None},
                    {'time': 59, 'positionX': 85, 'positionY': 169, 'attacker': 47, 'npcId': 10, 'byAbility': 5219, 'byItem': None},
                    ...,
                ]
            }
        }
        ...,
    ]
}

glom spec

    def is_a_creep(npc_id: int) -> bool:
        if npc_id != creep:
            return False
        return True

    ...

    spec = (
        'parse_player_events',
        [(
            'playbackData.csEvents',
            [(
                {
                    'match_id': Val('match id'),
                    'event_type': Val('creep kill'),
                    # 'id': ...,
                    'time': 'time',
                    'x': 'positionX',
                    'y': 'positionY',
                    'actor_id': 'attacker',
                    'target_id': 'npcId',
                    'ability_id': 'byAbility',
                    'item_id': 'byItem'
                },
                Check(T['target_id'], validate=is_a_creep, default=SKIP)
            )]
        )],
        Flatten(),
        # Invoke(sorted).specs(T).constants(key=lambda d: d['time']),
        # Invoke(enumerate).specs(T),
        # [lambda e: {**e[1], 'id': e[0]}]
    )

but it doesn't appear to be skipping when is_a_creep fails validation. In fact, even when I modify is_a_creep to only ever return False, still nothing is getting skipped.

If I remove the default=Skip kw, then I get GlomCheckErrors as appropriate.

Overall, I have loved the glom API and have started converting a lot of my project over to use it, but I am quite frustrated with getting the syntax correct on validation. I attempted to use Match as well, but couldn't grok the syntax well enough to get that one functional either.

kurtbrose commented 2 years ago

Sorry, you can into a problem with SKIP and STOP being values rather than real control flow mechanisms

There's some work to add CONT and BREAK as real control flow primitives so that they aren't sensitive to the specs in between:

https://github.com/mahmoud/glom/pull/183

Let me see if I can reproduce and figure out where SKIP is going.

kurtbrose commented 2 years ago

Yea, so this is confusing as heck, but basically [(T, Check(default=SKIP))] behaves differently than [Check(default=SKIP)].

Here's an example you can try for yourself:

>>> glom([1, 0], [Check(T, validate=lambda t: False, default=SKIP)])
[]
>>> glom([1, 0], [(T, Check(T, validate=lambda t: False, default=SKIP))])
[1, 0]

The issue is that the tuple is swallowing the SKIP -- tuple interprets SKIP to mean "continue on to the next item".

CONT or BREAK would probably have the same issue. One work around is to iterate twice[( {}, Check)] to ( [{}], [Check]):

    spec = (
        'parse_player_events',
        [(
            'playbackData.csEvents',
            (
                [{
                    'match_id': Val('match id'),
                    'event_type': Val('creep kill'),
                    # 'id': ...,
                    'time': 'time',
                    'x': 'positionX',
                    'y': 'positionY',
                    'actor_id': 'attacker',
                    'target_id': 'npcId',
                    'ability_id': 'byAbility',
                    'item_id': 'byItem'
                }],
                [Check(T['target_id'], validate=is_a_creep, default=SKIP)]
            )
        )],
        Flatten(),
        # Invoke(sorted).specs(T).constants(key=lambda d: d['time']),
        # Invoke(enumerate).specs(T),
        # [lambda e: {**e[1], 'id': e[0]}]
    )