pyeve / cerberus

Lightweight, extensible data validation library for Python
http://python-cerberus.org
ISC License
3.16k stars 240 forks source link

readonly param designed behaviour is broken with oneof, anyof #297

Open rredkovich opened 7 years ago

rredkovich commented 7 years ago

Use-case abstract

According to docs readonly parameter should give False on validation if field with that parameter is in dict.

Works as expected in general case:

import cerberus

schema_one = {
    'id': {'type': 'integer', 'readonly': True},
    'text': {'type': 'string', 'required': True}
}
schema_two = {
    'id': {'type': 'integer', 'readonly': True},
    'extra_text': {'type': 'string', 'required': True}
}

v = cerberus.Validator({'record': {'type': 'dict', 'schema': schema_one}})

test_data = {'record': {'id': 10, 'text': 'ooops'}}
v.validate(test_data)
# -> False
v.errors
# -> {'record': [{'id': ['field is read-only']}]}

Support request / Bug report

However using multiple schemas with oneof for field will produce unexpected True.

import cerberus

schema_one = {
    'id': {'type': 'integer', 'readonly': True},
    'text': {'type': 'string', 'required': True}
}
schema_two = {
    'id': {'type': 'integer', 'readonly': True},
    'extra_text': {'type': 'string', 'required': True}
}

v = cerberus.Validator({'record': {'type': 'dict', 'oneof_schema': [schema_one, schema_two]}})

test_data = {'record': {'id': 10, 'text': 'ooops'}}
v.validate(test_data)
# -> True
v.errors
# -> {}

anyof gives same result, not tested with noneof and allof. Bug exists in version 1.1, 1.0.x, in 9.x exception is raised when using readonly param.

funkyfuture commented 7 years ago

i pledge to mark this for the 1.2 milestone.

rredkovich commented 6 years ago

I have performed debug of oneof validation.

Schema traversal procedure is correct, but at some point assumption is made that readonly property was already validated in normalization process:

    def _validate_readonly(self, readonly, field, value):
        """ {'type': 'boolean'} """
            if not self._is_normalized:
                self._error(field, errors.READONLY_FIELD)
            # If the document was normalized (and therefore already been
            # checked for readonly fields), we still have to return True
            # if an error was filed.

Link to source

However, normalization do readonly validation only at first level of schema without traversal at all, rules inside oneof are not inspected and evaluated.

    def __validate_readonly_fields(self, mapping, schema):
        for field in (x for x in schema if x in mapping and
                      self._resolve_rules_set(schema[x]).get('readonly')):
            self._validate_readonly(schema[field]['readonly'], field,
                                    mapping[field])

Link to source called from normalization method

Passing normalize=False to validate method removes the issue.

Here comes a question, what is right way to fix according to library architecture which I have only partial knowledge about:

  1. Assumption about performed validation on normalization stage in _validate_readonly method should be removed
  2. Normalization should be updated to correctly handle oneof (and other *of?)
funkyfuture commented 6 years ago

solving this for one of the rules should solve it for all as the subvalidation dispatching is the same.

@dkellner, if i'm not wrong you could be able to give competent guidance on this issue, iirc you implemented the conjunction of normalization and validation for the readonly rule.

dkellner commented 6 years ago

Yes, it was in PR #282. The other (non-simple) fix we had for that is in #272, and I still have the branch here: https://github.com/dkellner/cerberus/tree/default-readonly .

IIRC and Cerberus still works the same internally, we're doing two full traversals now: one for normalization, and one for validation (more details in e.g. https://github.com/pyeve/cerberus/issues/268#issuecomment-248239479). readonly is special in that: it's a validation rule that needs to be performed before normalization because it should see the original document before any modifcation happened. Unfortunately you cannot easily traverse the document for just readonly (that is exactly why I've implemented the rule_filter in #272).

Unfortunately I cannot look deeper into this now as lunar new year is coming. Let me know if I can still be of help in about two weeks.

funkyfuture commented 6 years ago

we're doing two full traversals now: one for normalization, and one for validation

and i want to point out that the two traversals are different and that itches me because it complicates things, but there seems no sane way around it.

anyway, has someone the time to tackle this bug? it's the hardest nut on the 1.3 playlist so far.

dkellner commented 6 years ago

anyway, has someone the time to tackle this bug? it's the hardest nut on the 1.3 playlist so far.

I'm afraid I will not find the time for that the coming weeks. There are still some "urgent" issues in Eve-SQLAlchemy I'd have to tackle first and that's progressing rather slowly already...

and i want to point out that the two traversals are different and that itches me because it complicates things, but there seems no sane way around it.

I'd argue the very nature of normalization and validation are different. If we're not doing a clear separation we will run in all kind of interesting bugs as the document keeps changing at a stage (validation) we're not expecting that anymore. And with readonly we actually need three traversals: pre-normalization-validation (readonly), normalization, post-normalization-validation (most rules).

funkyfuture commented 6 years ago

I'd argue the very nature of normalization and validation are different. If we're not doing a clear separation we will run in all kind of interesting bugs

that is more or less my point. the thing is, the code is very prone to bugs, though the processing stages are seperated clearly and because the user interface happens to expose rules and interdependencies of such across the stages. but we wouldn't want to change the simplicity of the interface that is provided by this. i just feel the urge to re-iterate often enough that each design decision and implementation detail that touches these circumstances must be well reasoned and that it all has its limits.

the readonly rule will actually be a good test of this proposed design, it may prove it to be unfit or provide the ground for a cleaner solution.

funkyfuture commented 6 years ago

before i leave town for a while i'd like to share some thought i had when zooming out on this issue.

i noticed a contradiction that hadn't been formulated yet:

the first statement is that we do not consider normalization rules as valid in the *of-rules. yet, the readonly-rule meanwhile became a rule that is applied during normalization as well. that just doesn't fit and a whole bunch of mechanics would have to be implemented just to overcome the limitiation of the 1st statement. with these in place, normalization would actually 'work' when declared in of-rules. i not only consider this as guarantee for lot of grey hairs, but also as a 'tool' for users that they don't need (if there's need for conditional normalization that should really not be done with one* schema) and can only abuse and fail.

so, i'd like to take these two options into consideration:

@rredkovich, regarding your concrete use-case:

why don't you define rules for the id field outside the *of-scope? it's the same definition declared in both.

funkyfuture commented 6 years ago

unless there are other proposals how to proceed with this issue, i'm going to remove the 1.3 milestone from this.