Open csingley opened 6 years ago
This would be really nice for parsing e.g. dates from HTML forms, where an empty <input type=date>
(or any empty <input>
) results in the empty string.
Seconded. This would be pretty handy for parsing empty <input>
fields.
I think it makes sense for marshmallow to support this use case. We should explore marshmallow's existing functionality before we add a feature to the core.
Your use case:
None
None
handling per Field
instanceThe general solution for customizing deserialization behavior should be a custom field.
http://marshmallow.readthedocs.io/en/latest/custom_fields.html
from marshmallow import fields
class CustomField(fields.String):
def _deserialize(self, value, attr, obj):
if value == '':
return None
return super(CustomField, self)._deserialize(value, attr, obj)
Is there anything about this that doesn't work for your use case?
Writing custom field classes would work, of course... but if I have to write a custom Field subclass for every single data type, I'll just go ahead and roll my own type conversion/validation module and save myself the import. Indeed, that's what I ended up doing for the project where I was evaluating marshmallow.
It's a bit of a shame, because marshmallow has somef features that make it play nice with ElementTree (by accident I suppose). This is a strength that could be capitalized upon.
Taking a step back - the meaning of an empty string is defined globally by the protocol and/or parsing library.... "emtpy string is NULL" is a common choice. To map that protocol into Python data types, you want to write that equivalence once, globally... not write the same boilerplate ten times, once per validator. That's most inelegant.
Setting my use case aside, you've got users saying "hey wait a minute, HTML forms also use 'empty string is NULL' for lots of different data types". To me that sounds like a fairly key use case for marshmallow, and not handling it in the library seems like a bit of an oversight.
Just a friendly $0.02, because I see a lot to like in marshmallow. I probably would have used it for my project if this feature existed.
I don't think it is unreasonable to create a wrapper class for each of the fields you need to customize. It should be possible to create a mixin to make this clean and maintainable.
from marshmallow import fields
class NoneMixin(object):
def _deserialize(self, value, attr, obj):
if value == '':
return None
return super(NoneMixin, self)._deserialize(value, attr, obj)
class CustomStringField(NoneMixin, fields.String):
pass
class CustomDecimalField(NoneMixin, fields.Decimal):
pass
class CustomDateField(NoneMixin, fields.Date):
pass
...
Would this work for the empty HTML input use case?
The custom field described by @deckar01 only needs to be created once. Then, in the app, it can be used in place of the String
field.
Agreed, this is far from perfect.
marshmallow.fields
, he needs to also import from his own custom fields module. Cosmetic.Email
field should not be treated as String
, so I end up registering all my custom fields manually), but this should be a corner case.NoneMixin
makes this easier, but fields must be subclassed anyway).I think @csingley's proposal makes sense.
Wouldn't it be better, however, to treat empty strings as missing
rather than None
. I'm thinking of the HTML form use case, for instance.
def deserialize(self, value, attr=None, data=None)
+ if value in self.null_values:
+ value = missing
self._validate_missing(value)
...
Should this be used only on deserialization or on serialization as well? See this discussion about removing nullish values from output: https://github.com/marshmallow-code/marshmallow/issues/229.
I've got no skin in the game at this point, but from a user's point of view I think the mixin proposal is pretty reasonable (although not surprisingly I prefer my own approach... who doesn't?) Maybe a documentation enhancement outlining the NoneMixin pattern would be a good place to start?
@lafrech - re: missing vs None - that would work fine for my specific case, but these two things can sometimes require different handling. If the data protocol changes, and a new field is added, you might want to have your code do one thing if the field is missing (old protocol) vs. blank (new protocol, default value). Obviously this doesn't really apply to web forms, but I'd imagine in general you'd want to discriminate between the case where the field is empty and the case where it's simply not sent in the data stream.
Anyway, that's enough back seat driving from me! Cheers, and thanks for your work.
@anowlcalledjosh Did you come up with a solution for your use case?
@mgd722 Would a custom field be an acceptable solution for your use case? https://github.com/marshmallow-code/marshmallow/issues/713#issuecomment-378125776 or https://github.com/marshmallow-code/marshmallow/issues/713#issuecomment-378247878
@deckar01 I ended up using the following on my schemas:
@pre_load
def allow_null_dates(self, item):
item["date"] = item["date"] or None
item["enddate"] = item["enddate"] or None
Having to copy-paste subclasses seems a bit boilerplate-y, and makes integration with libraries like Marshmallow-SQLAlchemy (which automatically generates Marshmallow schemas) harder. Maybe for strings that's acceptable, but since the empty string isn't a valid date, I think dates should certainly turn ""
into None
.
Yes, a custom field could work for me but I do agree that it's rather inelegant. I ended up going with something like @anowlcalledjosh above to solve my issue. Since an empty string will never be a valid date, I'll second that (at least for dates) ""
should end up as None
.
A general solution could be to add a parameter to Field
(available on all field types) to customize what input values are to be considered "null" or "missing".
Before finding this issue...
I just created a fields.Integer
subclass that has an attribute blank_as_none
.
It simple simple to implement but annoying as it needs to be done for all Numeric and Date types.
I can work on patch if there is interest on that..
Please see my proposed solution in the PR #1165. The implementation was pretty trivial and I guess it solves a pain point for many of us...
A demo on the proposed custom field solution.
from marshmallow import Schema, fields
# ---------------------Baseline---------------------
class SimpleSchema(Schema):
s = fields.String()
simple_schema = SimpleSchema()
print(simple_schema.load({'s': None}))
# > UnmarshalResult(data={}, errors={'s': ['Field may not be null.']})
print(simple_schema.load({'s': ''}))
# > UnmarshalResult(data={'s': ''}, errors={})
# ---------------------empty as None, reject None---------------------
# Now trying to deserialize empty string as None, and don't accept None
class NotEmptyString(fields.String):
"""A String field type where empty string is deserialized to None"""
def _deserialize(self, value, attr, data):
if value == '':
return None
return super()._deserialize(value, attr, data)
class NotEmptySchema(Schema):
s = NotEmptyString()
not_empty_schema = NotEmptySchema()
print(not_empty_schema.load({'s': None}))
# UnmarshalResult(data={}, errors={'s': ['Field may not be null.']})
print(not_empty_schema.load({'s': ''}))
# > UnmarshalResult(data={'s': None}, errors={})
# Value is translated to None, but it somehow breaks the process and result in a valid None
# ---------------------empty as None, accept None---------------------
# Now trying to deserialize empty string as None, but accept None
class NotEmptyButNoneOkSchema(Schema):
s = NotEmptyString(allow_null=True)
not_empty_but_none_ok_schema = NotEmptyButNoneOkSchema()
print(not_empty_but_none_ok_schema.load({'s': None}))
# UnmarshalResult(data={}, errors={'s': ['Field may not be null.']})
# Here we even have regression. The custom field even overrides allow_null
# option. Not what I want.
print(not_empty_but_none_ok_schema.load({'s': ''}))
# > UnmarshalResult(data={'s': None}, errors={})
# Here is what I wanted
It seems that the allow_null
option is applied before the deserialization.
Making it impossible to work after a custom field that deserializes as None in some
cases.
Basically, it's seems impossible to obtain the following behaviors:
Interpret empty strings as None
, and returns error because null
/None
is not allowed, with the same error message as if the input was directly None
.
Interpret empty strings as None
, and accepts them because of the allow_null
option in the field arguments.
This has been a huge source of frustration for me.
For instance, I'm trying to have an email field that allows None
and ""
(empty string). I believe an API should be liberal in accepting its input. If a field is optional, then a missing field, a null
value or an empty string value should be considered legit.
So I'm trying to have this behavior throughout my codebase. In particular with the email fields, which by default does not allow it. This is what I had to do to allow empty emails:
class Email(fields.Email):
def __init__(self, *args, **kwargs):
kwargs.setdefault("allow_none", True)
super().__init__(self, *args, **kwargs)
def _validate(self, value):
if not value:
return
super()._validate(value)
def _validated(self, value):
# This actually does nothing
if not value:
return value
return super()._validated(value.strip().lower())
def _deserialize(self, value, attr, obj):
if not value: # nocov
return value
# Strip and lowercase the email
# NOTE: theoretically, the username part of email is not case
# sensitive, but in practice, nobody cares about it (Gmail for
# instance)
return value.strip().lower()
(not mentioning the fact that overriding private methods feels a hack, but that's the pattern chosen by the library - IMO serialize
and _serialize
should have been inverted)
It would be great to have a better story around empty (missing, None
or ""
) values for fields. This is very difficult when validators are involved.
def deserialize(self, value, attr=None, data=None, **kwargs):
# Validate required fields, deserialize, then validate
# deserialized value
self._validate_missing(value)
if value is missing_:
_miss = self.missing
return _miss() if callable(_miss) else _miss
if getattr(self, "allow_none", False) is True and value is None:
return None
output = self._deserialize(value, attr, data, **kwargs)
self._validate(output)
return output
The problem in deserialize
is that we don't have a way to bail out of validation and redefine value
(like serialize
does with its get_value
).
(not mentioning the fact that overriding private methods feels a hack, but that's the pattern chosen by the library - IMO serialize and _serialize should have been inverted)
serialize
is meant to be called by another class, so it is "public".
_serialize
is underscored because it is only meant to be called by its own class. It can be called and even overridden by subclasses. Hence the underscore meaning "protected".
"private" variables are double-underscored.
https://radek.io/2011/07/21/private-protected-and-public-in-python/ https://stackoverflow.com/questions/1641219/does-python-have-private-variables-in-classes
Let's revisit this issue post-3.0 release. I do think we want to have a story for validating empty strings.
See also discussion in Django docs per https://docs.djangoproject.com/en/2.2/ref/models/fields/#field-options.
It would feel weird to deal with this in e.g. SQLAlchemy, though, so some sort of unification of dealing with empty/null logic in strings seems reasonable in Marshmallow.
I think it would be odd to treat empty strings equivalent to None
; there are use cases where None
means "not provided", as distinct from ""
.
I think we should control None
and empty/blank values separately, as in Django. We could add an allow_empty
parameter to Field
and have empty_values
set on the field class.
class Field(...):
empty_values = {""}
default_error_messages = {
...
"empty": "Field may not be empty.",
"null": "Field may not be null.",
}
class List(...):
empty_values = ([], ())
class Dict(...):
empty_values = ({}, )
Thoughts?
You can achieve the above today with
from marshmallow import Schema, fields, validate
not_empty = validate.Length(min=1, error="Field may not be empty.")
class ArtistSchema(Schema):
name = fields.Str(validate=not_empty)
album_names = fields.List(fields.Str(), validate=not_empty)
ArtistSchema().load({"name": "", "album_names": []})
# marshmallow.exceptions.ValidationError: {'album_names': ['Field may not be empty.'], 'name': ['Field may not be empty.']}
So on one hand, the proposed API is redundant. On the other hand, it's a common enough use case that it may warrant first-class support.
Actually, I think I was misunderstanding the OP. IIUC, the goal is to interpret ""
as missing and deserialize to None
.
I think a more direct and flexible way to achieve this would be to allow specifying the values that should be interpreted as missing (as suggested by @lafrech in a previous comment).
# proposed API
class ArtistSchema(Schema):
# if input is None, "", or is missing from the input, deserialize to None
name = fields.Str(missing=None, missing_values={None, ""})
print(ArtistSchema().load({"name": ""})) # {'name': None}
If instead your backend models ""
as missing, you just change the value of missing
.
class ArtistSchema(Schema):
name = fields.Str(missing="", missing_values={None, ""})
This would allow any field to define what it means to be required. For example:
class ArtistSchema(Schema):
album_names = fields.List(fields.Str(), required=True, missing_values=([], ()))
ArtistSchema().load({"album_names": []})
# ValidationError: {'album_names': ['Missing data for required field.']}
What do you think?
I think a more direct and flexible way to achieve this would be to allow specifying the values that should be interpreted as missing
Indeed this approach would be more flexible. The problem is that it is very verbose. As OP work with XML or working with HTML forms, it is very common for blank string to used as missing value...
In this solution you would need to add missing="", missing_values={None, ""}
to all numeric and date fields, thats a lot of noise.
I have not even seen another use-case where people want other custom missing values, so maybe just dealing with the most common case would be better.
Another approach I would consider would be to specify missing_values
at a global or schema level. This way it would be flexible as you wish and also reduce verbosity.
As OP work with XML or working with HTML forms, it is very common for blank string to used as missing value
Yes, it is common to model ""
as missing, but there are also cases where None
is used for missing (IIUC, this is what the OP wants).
See the relevant Django docs:
... it’s redundant to have two possible values for “no data;” the Django convention is to use the empty string, not NULL. One exception is when a CharField has both unique=True and blank=True set. In this situation, null=True is required to avoid unique constraint violations when saving multiple objects with blank values.
The proposed API cleanly supports both cases.
Another approach I would consider would be to specify missing_values at a global or schema level. This way it would be flexible as you wish and also reduce verbosity.
We could set defaults at the class level, so you could do
# proposed API
fields.Field.default_missing_values = ("", )
class ArtistSchema(Schema):
name = fields.String(missing=None)
dob = fields.DateTime(missing=None)
print(ArtistSchema().load({"name": "", "dob": ""}))
# => {'name': None, 'dob': None}
Edit: Use tuple instead of set for default_missing_values to avoid TypeError
s when checking against unhashable types.
Thanks for picking this up
Actually, I think I was misunderstanding the OP. IIUC, the goal is to interpret "" as missing and deserialize to None.
Yes, this is just it. There are plenty of data formats where the keys are fixed, and the values are all strings; in such a scheme, the only way to represent "no data" is an empty string.
Having the "no data" be configurable on the class is a win. Cheers.
@schettino72 What do you think about the default_missing_values
solution above?
@sloria The problem I see with that solution (if I'm understanding it correctly) is that it's incompatible with required=True
. What if I want the following behavior?
schema.load({'some_field': None}) => {'some_field': None}
schema.load({'some_field': ''}) => {'some_field': None}
schema.load({}) => ValidationError: {'some_field': ['Missing data for required field.']}
The use case is differentiating between intentionally setting a field's value to null versus say accidentally botching a PUT request by leaving out a required field.
It's seems reasonable and common to represent "no data" in different ways (""
, None
, []
, absence), but it seems less common (and perhaps unidiomatic) to have multiple ways to represent None
. None
and ''
are not equivalent. If you really need ''
to deserialize to None
, perhaps you should use a custom field.
@sloria At the database level, it is common to have string fields where you allow either NULLs or zero-length strings, but not both (to have a single, consistent way of representing "no data"). With that said, I would ask, if someone is going to bother using Marshmallow to convert some kind of input to a database-ready format, why wouldn't they consider having it automatically convert ZLS to NULL or vice versa? Why is that unidiomatic?
You correctly point out that None and ''
are not always equivalent, but absence and None are also not always equivalent. Absence could mean "don't change the value," while None could mean "set the value to NULL" (e.g. a merge-style PATCH request). And as I stated above, absence could also be invalid and None could be intentional. Plus, Marshmallow's existing API already implies that absent and None are not always the same by allowing the argument combination required=True, allow_none=True
.
I guess what I'm getting at is that I don't see the harm in accepting this PR (assuming it works) for times when Marshmallow users don't want to conflate absent and None.
In the meantime, if anyone reading this cares to see another idea, I tweaked this code above and came up with this:
@pre_load
def replace_empty_strings_with_nones(self, data, **kwargs):
keys = ['project', 'model', 'inServiceDate']
for key in keys:
if key in data.keys():
data[key] = data[key] or None
return data
I've been reading through this, as I've been having this issue myself, so there's some vested interest.
The argument was brought up that None and "" need to mean two different things. For something like String, or a similar type where you would expect "", this is understandable. However, we do need to also consider things like Integer, and other similar types. Even using the Number input type with standard HTML and AJAX in accordance with https://www.w3schools.com/html/tryit.asp?filename=tryhtml_input_number , if you submit nothing in the input box, the parameter will have a value of "". This creates a "Not a valid integer." validation error message in Marshmallow, when one would expect this to be interpreted as if it were None.
For the time being, I have a workaround on String of using a custom type, but it's something that would be nice to include as an option.
BTW, as a note to the last comment: You really want to be careful about doing things that way, especially with integers, since 0 and None should ALWAYS be two different things.
I've been reading through this, as I've been having this issue myself, so there's some vested interest.
The argument was brought up that None and "" need to mean two different things. For something like String, or a similar type where you would expect "", this is understandable. However, we do need to also consider things like Integer, and other similar types. Even using the Number input type with standard HTML and AJAX in accordance with https://www.w3schools.com/html/tryit.asp?filename=tryhtml_input_number , if you submit nothing in the input box, the parameter will have a value of "". This creates a "Not a valid integer." validation error message in Marshmallow, when one would expect this to be interpreted as if it were None.
For the time being, I have a workaround on String of using a custom type, but it's something that would be nice to include as an option.
BTW, as a note to the last comment: You really want to be careful about doing things that way, especially with integers, since 0 and None should ALWAYS be two different things.
Did you ever find a good approach to handling empty strings as ints? This is exactly my current issue and a bit surprised to find there isn't a straightforward way to interpret "" as 0. Perhaps I'll look at subclassing fields.Integer but then since I'm trying to use it with SQLAlchemySchema I imagine I would have to overwrite the Integer Class.
You don't necessarily want to interpret "" as 0, because 0 is a legitimate value that one could enter, whereas None, which is interpreted as "" in this case, means there is no value, and you definitely want to distinguish between the two. If you want 0 as a default value, that's a different story.
To answer your question of what I did, I made a MyInteger schema type that handles this accordingly, and then attach it to the attribute in my ModelSchema class (I'm using Flask and SQLAlchemy in addition to marshmallow). I usually only use it if the integer is nullable, as the error occurs at the DB upsert (update or insert), and if it's empty the required validator would usually catch it. In the _serialize and _deserialize methods, just check for "" or None, and make it a None; otherwise return the super, which should be able to handle "not valid" values, like "asdf". You don't need to overwrite the entire Integer class for your model, as this is happening when you load or dump the object, and SQLAlchemy knows how to deal with None just fine.
I'd like to revisit this suggestion - allowing empty strings to be interpreted as None:
https://github.com/marshmallow-code/marshmallow/issues/543
I understand the sentiment of the response: "it's a bad idea to support that". However, that's not sufficient for some use cases.
For example, right now I'm trying to use marshmallow to convert an XML format where null values are sent as empty strings. I don't get to control the XML format.
I think it's entirely reasonable for a Field to interpret a null datestring (or a null decimal string) as None, and control via
allow_none
whether to pass that. I don't think it's reasonable to fall down and say "the NULL value defined by your format is invalid, and we're not going to allow you to express that". This just forces a lot of ugly preprocessing workarounds onto library users, who need to clutter their code with superfluous, bug-pronevalue = value or None
type declarations... thereby mixing format definition code in the middle of application code, which makes it hard to use marshmallow to write good code for these use cases.I feel that the right place to define a format is, well, in the Schema class definition. A patch to accomplish this is simple:
null_values = {None}
if getattr(self, 'allow_none', False) is True and value in self.null_values: return None
We can stop there, and allow users to support their nasty unhygienic formats by overriding null_values in a custom subclass... or even take it a step further and allow first-class support for text-only formats by directly setting
null_values = {None, ''}
for fields.Decimal, fields.DateTime, fields.Date, fields.Time, etc. By Zeus, maybe even fields.Integer could treat empty strings as None!!I'm happy to submit a PR to that effect, but I'd like to see where the developers' heads are at vis-a-vis using marshmallow for non-JSON formats before wasting anybody's time.
Thanks, and happy new year!