jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.01k stars 59 forks source link

Callbacks to `Encoder`/`Decoder` are not respected in `datetime` objects #669

Open TheMythologist opened 2 months ago

TheMythologist commented 2 months ago

Description

Description

Both dec_hook and enc_hook arguments are not respected in all encoders and decoders (tested on JSON and YAML) when datetime objects are used. Note that the print functions in both hooks are not run, and the variable buf contains an ISO 8601 duration string instead of a number (as seen from enc_hook).

Attached is a sample script to show that custom decoding of datetime.timedelta objects is not supported. It also doesn't work for datetime.datetime objects.

import msgspec
from typing import Any, Type
from datetime import timedelta

def enc_hook(obj: Any) -> Any:
    print("Encoding")
    if isinstance(obj, timedelta):
        # convert the timedelta to a number
        return obj.total_seconds()
    else:
        # Raise a NotImplementedError for other types
        raise NotImplementedError(f"Objects of type {type(obj)} are not supported")

def dec_hook(type: Type, obj: Any) -> Any:
    print("Decoding", type)
    # `type` here is the value of the custom type annotation being decoded.
    if type is timedelta:
        # Convert ``obj`` (which should be a ``number``) to a timedelta
        return timedelta(seconds=obj)
    else:
        # Raise a NotImplementedError for other types
        raise NotImplementedError(f"Objects of type {type} are not supported")

class MyMessage(msgspec.Struct):
    field_1: str
    field_2: timedelta

enc = msgspec.json.Encoder(enc_hook=enc_hook)
dec = msgspec.json.Decoder(MyMessage, dec_hook=dec_hook)

msg = MyMessage("some string", timedelta(seconds=5))

# Doesn't work for JSON decoder
buf = enc.encode(msg)
print(buf)
a = dec.decode(buf)
print(a)

# Doesn't work for YAML decoders either
buf = msgspec.yaml.encode(msg, enc_hook=enc_hook)
print(buf)
a = msgspec.yaml.decode(buf, type=MyMessage, dec_hook=dec_hook)
print(a)
TheMythologist commented 2 months ago

Update: This was broken sometime between version 0.16.0 and version 0.17.0.

Update: It was this specific commit that broke the hook for datetime.timedelta objects: 2b72ebbf91ec0e294e049ba584e81400a71ef37a

Update: Seems like hooks for datetime.datetime objects were broken since the start

wikiped commented 2 months ago

.encode and .decode methods under the hood call msgspec.to_builtins and msgspec.convert functions respectively.

Both functions have parameter builtin_types, which disables processing of specified builtin types by the msgspec, but it does not pass those types to *_hook methods, only non-builtin types are passed to *_hooks.

Wether this is a bug or by design - only @jcrist can tell (no pun intended :-) But it definitely feels like a bug.

The above can be illustrated with:

import msgspec as ms
import datetime as dt

def enc_hook(obj: Any) -> Any:
    print("Encoding")
    if isinstance(obj, T):
        return obj.name
    if isinstance(obj, dt.timedelta):
        # convert the timedelta to a number
        return obj.total_seconds()
    else:
        # Raise a NotImplementedError for other types
        raise NotImplementedError(f"Objects of type {type(obj)} are not supported")

class T:

    def __init__(self, name='some name'):
        self.name = name

class MyMessage(ms.Struct):
    field_1: T
    field_2: dt.timedelta

msg = MyMessage(T(), dt.timedelta(seconds=5))

msg_encoded = ms.to_builtins(
        msg,
        builtin_types=(
                dt.timedelta,
        ),
        enc_hook=enc_hook
    )

print(msg_encoded)

The above outputs:

Encoding
{'field_1': 'some name', 'field_2': datetime.timedelta(seconds=5)}

I can see 2 ways to overcome this behaviour until (if ever) it gets changed:

  1. Implement your own encode/decode method where you can control what happens to dict produced by msgspec before it gets sent to en/de-coders.
  2. Wrap builtin type in custom type to be handled by _hooks.