jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.3k stars 67 forks source link

Dataclass inheritance doesn't work when base class is decoded first #598

Closed clint-enki closed 10 months ago

clint-enki commented 10 months ago

Description

When using inheritance in dataclasses, msgspec decodes a derived class as it's base class when the base class is decoded once beforehand. I've made an minimal example to replicate the issue.

a.py:

from dataclasses import dataclass

@dataclass
class Base:
    a: int

@dataclass
class Foo(Base):
    b: int

b.py:

import msgspec

def test_decode():
    # Base class decoded first
    from a import Base

    x = Base(a=1)
    ser = msgspec.json.encode(x)
    dser = msgspec.json.decode(ser, type=Base)
    assert x == dser

    from a import Foo

    x = Foo(a=1, b=2)
    ser = msgspec.json.encode(x)
    dser = msgspec.json.decode(ser, type=Foo)
    assert x == dser

if __name__ == "__main__":
    test_decode()

Running pytest:

pytest b.py
===================================================== test session starts =====================================================
platform linux -- Python 3.11.6, pytest-7.4.3, pluggy-1.3.0
rootdir: /tmp/
plugins: anyio-4.0.0
collected 2 items

b.py F                                                                                                                  [100%]
========================================================== FAILURES ===========================================================
_________________________________________________________ test_decode _________________________________________________________

    def test_decode():
        from a import Base

        x = Base(a=1)
        ser = msgspec.yaml.encode(x)
        dser = msgspec.yaml.decode(ser, type=Base)
        assert x == dser

        from a import Foo

        x = Foo(a=1, b=2)
        ser = msgspec.yaml.encode(x)
        dser = msgspec.yaml.decode(ser, type=Foo)
>       assert x == dser
E       assert Foo(a=1, b=2) == Base(a=1)

b.py:20: AssertionError
=================================================== short test summary info ===================================================
FAILED b.py::test_decode - assert Foo(a=1, b=2) == Base(a=1)
====================================================== 1 failed in 0.04s ======================================================

However, if the derived class is decoded first then the base class is correctly decoded. This works perfectly:

def test_decode():
    # derived class decoded first
    from a import Foo

    x = Foo(a=1, b=2)
    ser = msgspec.yaml.encode(x)
    dser = msgspec.yaml.decode(ser, type=Foo)
    assert x == dser

    from a import Base

    x = Base(a=1)
    ser = msgspec.yaml.encode(x)
    dser = msgspec.yaml.decode(ser, type=Base)
    assert x == dser

Tested with:

jcrist commented 10 months ago

Thanks for the excellent issue report - this was a subtle issue in our caching implementation. Fixed in #599.