jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.3k stars 67 forks source link

Question: computed fields #573

Open sgstq opened 11 months ago

sgstq commented 11 months ago

Description

Sorry if I missed this in the docs or other issues but is there a more decent way to create a computed field based on the value of another field/s? Currently my implementation looks a bit clumsy:

class MyModel(msgspec.Struct):
    field1: int
    field2: int
    fieldSum: int = 0

    def __post_init__(self):
        self.fieldSum = self.field1 + self.field2

in dataclass fields we have init=False option but not in msgstruct.field

FHU-yezi commented 11 months ago

Maybe we can just use @property decorator? Like this:

class MyModel(msgspec.Struct):
    field1: int
    field2: int

    @property
    def fieldSum(self) -> int:
        return self.field1 + self.field2

But it is caculated when we access it.

sgstq commented 11 months ago

The goal is to have this field in encoded json, and in the result of to_buitins. I believe this will not work for decorated property.

ml31415 commented 11 months ago

I guess the cleanest way to do that, would be to hand that computed field on instantiating the class. Seems a bit out of scope.

illeatmyhat commented 10 months ago

class MyModel(msgspec.Struct): field1: int field2: int fieldSum: int = 0

For now, try:

 fieldSum: int | UnsetType = UNSET

UNSET is a special case that specifically means the value was not provided in the source data. It's not exactly what you're looking for, but it's the closest approximation.

Maybe we can just use @Property decorator? Like this:

class MyModel(msgspec.Struct):
    field1: int
    field2: int

    @property
    def fieldSum(self) -> int:
        return self.field1 + self.field2

But it is caculated when we access it.

Until https://github.com/jcrist/msgspec/issues/199 is done, you could use functools.cache for really complex properties that aren't emitted from to_builtins(). Otherwise performance should be measured before making a real decision.

jcrist commented 9 months ago

Apologies for the delay here

Sorry if I missed this in the docs or other issues but is there a more decent way to create a computed field based on the value of another field/s?

By this do you mean "computed fields" that will be part of the encoded representation, but aren't/can't-be used for decoding? Something like:

class Ex(Struct):
    a: int

    @msgspec.computed_field
    def b(self):
        return self.a + 1

obj = Ex(1)

print(f"b = {obj.b}")
#> b = 2

msg = msgspec.json.encode(obj)
print(msg)
#> b'{"a":1,"b":2}'

obj2 = msgspec.json.decode(b'{"a":1}',` type=Ex)  # b is not needed (or used) for decoding
assert obj == obj2

This functionality doesn't currently exist in msgspec, but is something we could support. Open questions:

class Test(msgspec.Struct, forbid_unknown_fields=True):
    a: int
    @msgspec.computed_field
    def b(self):
        return self.a + 1

msg = b'{"a": 1, "b": 2}'

msgspec.json.decode(msg, type=Test)  # does this error since `b` isn't a true field?
sgstq commented 9 months ago

@jcrist, thank you for your reply. Yes, the snippet you provided is exactly what I'd like to have.

  • What should this feature be called? "Computed fields"? "Encoded properties"? What should the decorator name be?

Naming isn't my strongest suit :), but I think the computed_ prefix better explains the nature of the property. So, computed_field (to be recognizable for those who came from other frameworks), or computed_property / derived_property seem clear.

  • If extra fields are forbidden when decoding (forbid_unknown_fields=True), what happens when decoding a message containing a computed field?

It definitely should raise an error for the sake of specification consistency.

Another question is how to behave when forbid_unknown_fields=False and the property is provided:

illeatmyhat commented 9 months ago

It should be noted that there are use cases for computed fields that are encoded (above), and computed fields that aren't (recursive or cyclic data structures like OpenAPI)