Open lafrech opened 6 years ago
[Let's strip this from OP, as it is long enough already.]
BTW, note that there can be errors for a missing field if a schema-level validator (with skip_on_field_errors=False
) raises errors and blames them on the nested field, for instance:
raise ValidationError('Some error', 'nested')
so this is possible:
{'nested': ['Missing data for required field.', 'Other error from schema level validator.']}
Worse, if nested fields could be reached using dot notation (I don't think it is possible right now), you could even have errors for fields deeply nested in a missing nested field:
raise ValidationError('Some error', 'nested.nested_again.field')
errors = {
'nested': {
'_field': ['Missing data for required field.'],
'nested_again': {
'field': ['Some error']
}
}
This sounds wrong so it's probably better not to support this.
@sloria, @deckar01, any ideas?
I've been thinking about this and I might have come up with something.
schema and field are more or less marshmallow-specific. They don't necessarily make sense from client perspective.
Also, I don't think there needs to be a difference between errors in a nested field (e.g. 'Missing data for required field' and errors in a nested field's schema ('Invalid data type'). So nested _schema
errors can be merged with errors about the Nested
field.
My proposal is to keep an error dict
structure to follow the hierarchy of the Schema
, and for each field with errors, store errors list under a special _errors
key.
This is what I excluded at first when I wrote
putting errors on fields themselves in a dedicated
_field
key for all fields seems a bit overkill as well.
but I see no alternative, and thinking again, it makes things more consistent: every value is a subdict, unless the key is _errors
, in which case it is a list of errors, so the dict
is easier to parse as the type of each value is well defined.
Here's what it would look like:
errors = {
'_errors': [...], # Schema errors
'field': {'_errors': [...]}, # Errors about 'field' field
'nested': {
'_errors': [...], # Errors about 'nested' field or nested schema errors
'field': {'_errors': [...]}, # Errors about 'nested.field' field
},
'nested_many': {
'_errors': [...],
'0': {
'_errors': [...],
'field': {'errors': [...]},
},
},
'list': {
'_errors': [...],
'0': {'_errors': [...]},
}
'list_nested': {
'_errors': [...],
'0': {
'_errors': [...],
'field': {'_errors': [...]},
},
},
'dict': {
'_errors': [...],
'0': {
'key': {'_errors': [...]},
'value': {'_errors': [...]},
}
}
'dict_nested': {
'_errors': [...],
'0': {
'key': {'_errors': [...]},
'value': {
'_errors': [...],
'field': {'_errors': [...]},
},
}
}
}
I didn't assess the implementation difficulty.
Thoughts anyone?
Separating _schema
and _field
is an independent issue. I think we should merge them but it may not matter so much. Let's assume we do.
_errors
for fields returning error dict
sTo be consistent, errors in a nested field should always be a dict
, whether or not there are errors in fields in the nested schema. E.g.:
'nested': {
'_errors': [...], # Errors about 'nested' field or nested schema errors
'field': ..., # Errors about 'nested.field' field
},
It is the mixed approach in #299, and it has two issues:
Inconsistency: Some fields have a _errors
key, most don't. We could live with that.
'field_1': ['Error', 'Another error'],
'field_2': ['Error', 'Another error'],
'nested': {
'_errors': ['Error', 'Another error'],
'field': ..., # Errors about 'nested.field' field
},
Implementation: When store_error
is called, the type of the field is unknown, the only available informations are the type of the error messages to add (dict
or list
) and the type of already existing error messages if any. There is no way to ensure nested fields always have their errors under an _errors
key.
(Note that this apply to other fields such as List
, Dict
and custom fields returning dict
errors and that this category of fields is only defined by its return value. Any custom field could return a dict
, so relying on the field type and matching it with a hardcoded list would not be sensible, anyway. Unless explicitly forbidden, there may even be custom fields returning different types of values depending on the errors.)
I don't see how to have this working correctly. We could hack store_error
to always check already existing error messages for the field and ensure no data is lost (ensure a dict
won't erase a list
or conversely) and catch type errors (trying to append a list
to a dict
), but I don't think we can fix the "nested fields generate sometimes error lists, sometimes error dicts" issue.
I think the "_errors
everywhere" approach is the only approach that is safe against this.
_errors
for all fieldsThis is why I suggested to use _errors
in every field:
'field_1': {'_errors': ['Error', 'Another error']},
'field_2': {'_errors': ['Error', 'Another error']},
'nested': {
'_errors': ['Error', 'Another error'],
'field': ..., # Errors about 'nested.field' field
},
The fact that is it more verbose may not be an issue. Those messages are meant to be read by software. Human readable errors would be even better, but I think it is secondary. This is debatable.
While implementing this, I realized it is a bit invasive when it comes to "structural" fields returning errors dict
s, such as List
or Dict
, because the fields themselves need to be modified to use that _errors
key.
If we go this way despite the ugliness and verbosity of it, we'll have to either modify the way those structural fields return their errors or accept a little inconsistency in the errors inside those fields.
I pushed a branch to show examples (https://github.com/marshmallow-code/marshmallow/tree/dev_879_report_error)
First commit without changes in List
/Dict
. Little inconsistency as _errors
does not appear in their structure.
Second commit with changes in List
/Dict
to ensure consistent use of _errors
.
I added a test to produce an output more or less equivalent to the examples above, in which I copy-pasted the pretty-print of the generated errors dict
.
Modifying those fields is not necessarily an issue. I wish all that logic could stay in the Marshaller/Unmarshaller, but maybe that's just the way it is.
Some solve this by flattening the error structure, but I don't like it so much.
https://medium.com/engineering-brainly/structuring-validation-errors-in-rest-apis-40c15fbb7bc3
Apparently, flask-restplus and Django use the same kind of dict structure as marshmallow.
Anyone has any experience with those and know how they deal with this?
More thoughts and a different approach.
I should back this up with experimentation and a deeper analysis, but I have the feeling that the whole issue can be solved by
_schema
/_field
as proposed aboveskip_on_field_error=False
possibility, in other words always skip schema validation if there are errors in the fieldsI think the latter would mean that a container field could only have errors in the nested fields (thus a dict
error structure without _schema
or _field
) or in itself (thus a list
error structure).
When parsing the error structure, the client would expect either a dict
(subfield errors) or a list
(field errors), which I find reasonable.
Obviously, this would be a breaking change as it would not be possible to run schema validators in case of field errors, but is that really an issue? Would it be that bad? From the discussions in https://github.com/marshmallow-code/marshmallow/issues/323, it looks like this behaviour is surprising and not desired and the skip_on_field_error
flag was introduced to remove it in a non-breaking way. Perhaps removing it completely wouldn't be problematic. At least, in the discussion, nobody stood up to defend it.
I hope I didn't miss another corner case. If all of this is correct, then we have a solution that is much simpler and much less breaking than what I exposed in the comments above. And an error structure that is close to django, flask-restplus and what marshmallow users are used to.
Thanks @lafrech for your analysis. I'll have a closer look at this thread soon. In the meantime, it might be worth evaluating how other libraries handle this. See the ones that I looked at for the unknown
feature: https://gist.github.com/sloria/2fac357710f13e855a5b9cf8f05a4da0
I tried flask-restplus quickly but I didn't succeed in reproducing the example in the docs.
I didn't try any of the libs in your list. Here's what I gathered from a quick look in the docs.
http://docs.python-cerberus.org/en/stable/errors.html
The error structure is a dict-like structure corresponding to the nested structure of the schema. Errors are stored in the errors
attribute of each node. This solves the "dict of list ?" issue: each node can hold a subdict and a list. But IIUC, this means the error structure is not a simple type. I don't see how that would serialize in json to be returned by an API, for instance.
https://github.com/alecthomas/voluptuous#multi-field-validation
Apparently, the recommended method won't apply schema validators in case of field errors, like I suggested above.
/me wrote in https://github.com/marshmallow-code/marshmallow/issues/879#issuecomment-430229952:
I think the latter would mean that a container field could only have errors in the nested fields (thus a dict error structure without _schema or _field) or in itself (thus a list error structure).
Almost but not quite.
Indeed field validators are skipped if fields fail deserialization and schema validators would be skipped as well in case of error if we dropped skip_on_field_error=False
. But there could still be several field validators or several schema validators reporting multiple errors on the same field.
What makes things complicated is the fact that validation errors may report more or less any structure: dict or list. So when several errors are reported on a field, you get to merge different error structures and here the troubles (and the inconsistencies: https://github.com/marshmallow-code/marshmallow/pull/299#issuecomment-401607834) begin.
If only one error can be reported, we don't care about the structure, there is no need to merge list with dicts, and there is no need for a special _field
key. That's why I hoped we could simplify things to be sure all field errors for a given field are reported at once. It doesn't seem to be the case.
We can live with the inconsistency, but things are worse: if an error is reported as dict, with current code, it overwrites the former error messages. I didn't take the time to check that yet but perhaps the merge function proposed in https://github.com/marshmallow-code/marshmallow/pull/442 would avoid this.
I can't comment about the fix proposed in #442 but I agree with the OP in #441. We could be more strict about what an error message should be, and impose that if a dict, it is a hierarchical structure (either subfields or equivalently mergeable subelements like in List
or Dict
).
But there could still be several field validators or several schema validators reporting multiple errors on the same field.
Well... almost, but not exactly.
Currently, @validates can only be called once per field (see https://github.com/marshmallow-code/marshmallow/issues/653). But this is a limitation in the decoration feature that we might want to overcome one day. And "several schema validators reporting multiple errors on the same field" is possible.
I've been spending hours on this and I don't see any way to fix the "inconsistency" (field returning errors either as list
or as dict
, same error being sometimes in the list
, sometimes in the dict
(under the _field
or _schema
key), depending on other errors).
However, I think #1026 is a great step towards a clearer interface and it ensures we're not losing error messages in the process.
I believe this issue can be postponed to after 3.0. Most probably 4.0 as changing this would be a breaking change.
Generally, a field name is associated a list of errors as strings. The exception to this is container fields. Those may return errors for the container and the contained field(s). In this case, rather than a list, a
dict
is used where the keys are the field names or the list index (as string or as integer):This has been working relatively fine for a while. It leaves a hole as you can't return errors relative to the
Nested
field itself, for instance, while returning errors for the fields in the nested schema.It is generally not blocking, as the most frequent
Nested
field level error is themissing
error, and when the field is missing, there is generally no nested error, so we returnIt is a shame that the structure (
dict
orlist
) depends on the error, but we could live with this.Except we sometimes need to return errors for both the field and the nested schema, so we came up with this (https://github.com/marshmallow-code/marshmallow/pull/299):
where
_field
stores errors about the field itself.This is inconsistent, as I said in https://github.com/marshmallow-code/marshmallow/pull/299#issuecomment-401607834 and showed in https://github.com/marshmallow-code/marshmallow/pull/863 because the errors on the nested field itself may be reported in different places depending on whether there are errors in the nested schema. In other words, a
dict
with a_field
attribute is used only if there are also errors in the nested schema, otherwise, the errors are returned as a strings, like for any field.The same considerations apply to all container fields (
List
,Dict
).I can't think of an ideal solution to allow reporting errors from both the container field and its nested schema/fields.
We'd like to keep the use of
dict
for containers because it makes browsing the error structure convenient:This rules out putting nested errors under a dedicated
_nested
key.On the other hand, putting errors on fields themselves in a dedicated
_field
key for all fields seems a bit overkill as well.A trade-off would be to use the
_field
key for fields errors for all container fields and only for them, but do it consistently.I suppose that was the intent of https://github.com/marshmallow-code/marshmallow/pull/299.
This introduces a distinction between containers and other fields.
It looks bad in the usual case where fields are missing, as shown above, because containers have their "missing data" message embedded in a
dict
.Also, the implementation might be challenging: how do you know a field is a container when registering the error?
I'd like to keep it simple, but I'm out of ideas, so I'm appealing to collective wisdom.
I have the feeling that a good decision on this would avoid many corner case issues, so this should be set before piling-up patches.