madman-bob / python-dataclasses-serialization

Serialize/deserialize Python dataclasses to various other data formats
MIT License
25 stars 11 forks source link

Deserializing exact type #8

Closed kitschen closed 4 years ago

kitschen commented 4 years ago

When writing a deserializer you might get bitten by unwanted/unexpected type hierarchies. Example you want to write a deserializer for datetime.date (that you get in a string format) and another for datetime.datetime that you get as an int representing millis since epoch.

@JSONSerializer.register_deserializer(datetime.date)
def date_deserializer(cls, d: str) -> datetime.date:
    try:
        return datetime.datetime.strptime(d, "%Y%m%d").date()
    except Exception as e:
        raise DeserializationError(e)

@JSONSerializer.register_deserializer(datetime.datetime)
def datetime_deserializer(cls, d: int) -> datetime.datetime:
    try:
        return datetime.fromtimestamp(timestamp / 1000.0, tz=pytz.utc)
    except Exception as e:
        raise DeserializationError(e)

Unfortunately, this will fail when receiving a millisecond timestamp. The reason is, that datetime.datetime is a subclass of datetime.date and thus the first deserializer is chosen which expects a string.

One workaround is to reverse the order of the two deserializers, so the more narrow one (for datetime) is registered first and thus found first. I suppose this is ugly (also, is this always the case? isn't a dict used with unpredictable iteration order?)

Another workaround is to specifiy the type in a Final way.

@JSONSerializer.register_deserializer(Final[datetime.date])
def date_deserializer(cls, d: str) -> datetime.date:
    try:
        return datetime.datetime.strptime(d, "%Y%m%d").date()
    except Exception as e:
        raise DeserializationError(e)

This fact might be interesting to add to the documentation.

madman-bob commented 4 years ago

You are correct that you shouldn't rely on registration order, although in practise it works for most versions of Python 3.6+.

Ambiguous deserialization was intentionally left undocumented, as I have not yet decided how best to go about it. I was considering something involving __mro__, but I'm not sure how that interacts with virtual subclasses.

I was unaware of the Final type - that's definitely worth mentioning.

kitschen commented 4 years ago

I'm mistaken, using Final here does not seem to work. So still searching a solution on how to select the right deserializer.

kitschen commented 4 years ago

OK insertion order can be relied on to tune selection (without using explcitely OrderedDict) as the doc says Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6.

madman-bob commented 4 years ago

Upon further research, I found that that is not the purpose of Final - used as a type annotation, it's intended as a sort of immutability. So a datetime is still an instance of Final[date].

For your other point: While, in certain circumstances, you can rely on dict insertion order, you cannot rely on this library using dicts internally. Indeed, in 1.3.0 that is no longer the case.

As of 1.3.0, ambiguous serializations/deserializations are resolved by taking the most precise type, if possible, and raising an exception if not. Updating to this version should resolve this issue.