keleshev / schema

Schema validation just got Pythonic
MIT License
2.86k stars 214 forks source link

Two tuples as dictionary keys report a false alarm of SchemaMissingKeyError #312

Open tjyuyao opened 4 months ago

tjyuyao commented 4 months ago

I am using schema 0.7.5 and python 3.11.

Following is a minimal code to reproduce

from schema import *

data_schema = Schema({
    ('map_point', 'to', 'map_polygon'): {},
    ('map_polygon', 'to', 'map_polygon'): {},
}, ignore_extra_keys=True).validate(
    {
        ('map_point', 'to', 'map_polygon'): {},
        ('map_polygon', 'to', 'map_polygon'): {},
    }
)

It will give the error:

Traceback (most recent call last):
  File "/home/hyuyao/2024/trackpred/qcnet_plug_in/qcnet_official_study/data_schema.py", line 7, in <module>
    }, ignore_extra_keys=True).validate(
                               ^^^^^^^^^
  File "/home/hyuyao/miniconda3/envs/RAMoPred/lib/python3.11/site-packages/schema.py", line 420, in validate
    raise SchemaMissingKeyError(message, e.format(data) if e else None)
schema.SchemaMissingKeyError: Missing key: ('map_polygon', 'to', 'map_polygon')

The expected behavior is to pass the validation without raise the error.

mutricyl commented 4 months ago

when checking a dict, keys from schema are checked against keys of data dict. This means the codes execute at some stage:

Schema(('map_point', 'to', 'map_polygon')).validate(('map_point', 'to', 'map_polygon'))
# and
Schema(('map_point', 'to', 'map_polygon')).validate(('map_polygon', 'to', 'map_polygon'))

which both validate considering the way Schema deals with list like. It would be needed to check keys hash and not keys values:

class Schema(object):
(...)
    def validate(self, data: Any, **kwargs: Dict[str, Any]) -> Any:
(...)
                for key, value in data_items:
                    for skey in sorted_skeys:
                        svalue = s[skey]
                        try:
-                            nkey = Schema(skey, error=e).validate(key, **kwargs)
+                           Schema(hash(skey), error=e).validate(hash(key), **kwargs)
+                           nkey = skey
                        except SchemaError:
                            pass
                        else:
                            if isinstance(skey, Hook):
(...)

it works for you case @tjyuyao but sadly it breaks a lot of tests 😞

mutricyl commented 4 months ago

restricting this to set and frozensets only seams to be fine for all pytests

proposed solution:

            with exitstack:
                # Evaluate dictionaries last
                data_items = sorted(data.items(), key=lambda value: isinstance(value[1], dict))
                for key, value in data_items:
                    for skey in sorted_skeys:
                        svalue = s[skey]
                        try:
                            if isinstance(skey, (tuple, frozenset)):
                                Schema(hash(skey), error=e).validate(hash(key), **kwargs)
                                nkey = skey
                            else:
                                nkey = Schema(skey, error=e).validate(key, **kwargs)
                        except SchemaError:
                            pass
                        else:

tests to add:


def test_tuple_key_of_dict():
    # this is a simplified regression test of the bug in github issue #312
    assert Schema({('map_point', 'to', 'map_polygon'): {}}).validate(
        {('map_point', 'to', 'map_polygon'): {}}
    ) == {('map_point', 'to', 'map_polygon'): {}}
    with SE:
        assert Schema({('map_point', 'to', 'map_polygon'): {}}).validate(
            {('map_polygon', 'to', 'map_polygon'): {}}
        ) == {('map_polygon', 'to', 'map_polygon'): {}}

def test_frozenset_key_of_dict():
    # this is a simplified regression test of the bug in github issue #312
    assert Schema({frozenset(('map_point', 'to', 'map_polygon')): {}}).validate(
        {frozenset(('map_point', 'to', 'map_polygon')): {}}
    ) == {frozenset(('map_point', 'to', 'map_polygon')): {}}
    with SE:
        assert Schema({frozenset(('map_point', 'to', 'map_polygon')): {}}).validate(
            {frozenset(('map_polygon', 'to', 'map_polygon')): {}}
        ) == {frozenset(('map_polygon', 'to', 'map_polygon')): {}}

@keleshev @skorokithakis @sjakobi @dblanchette or other maintainer, are you fine with this solution ?