lidatong / dataclasses-json

Easily serialize Data Classes to and from JSON
MIT License
1.34k stars 150 forks source link

[BUG] Nullable fields are not showing up when deserializing with field(default=None, metadata=config(exclude=lambda x: x is None)) #504

Closed yakovsushenok closed 6 months ago

yakovsushenok commented 6 months ago

Description

I'm not sure if I'm doing this correctly, but my goal is to deserialize json data which has optional properties and when the optional properties are null, have them not show up in the deserialized version of the data. Let's say I have this code:

from dataclasses import dataclass, field
from typing import Optional
from dataclasses_json import dataclass_json, LetterCase, config

@dataclass_json(letter_case=LetterCase.CAMEL)
@dataclass
class NewImage:
    pk: str = field(metadata=config(field_name="PK"))
    sk: str = field(metadata=config(field_name="SK"))
    created_by: str
    created_date_time: str
    optional_attribute_1: Optional[str] = field(default=None, metadata=config(exclude=lambda x: x is None))
    optional_attribute_2: Optional[str] = field(default=None, metadata=config(exclude=lambda x: x is None))

So when I receive data that has optional_attribute_1 but doesn't have optional_attribute_2, it will deserialize without optional attributes. I've looked at this issue, and that's how they say to ignore null values.

Code snippet that reproduces the issue

from dataclasses import dataclass, field
from typing import Optional
from dataclasses_json import dataclass_json, LetterCase, config

@dataclass_json(letter_case=LetterCase.CAMEL)
@dataclass
class NewImage:
    pk: str = field(metadata=config(field_name="PK"))
    sk: str = field(metadata=config(field_name="SK"))
    created_by: str
    created_date_time: str
    optional_attribute_1: Optional[str] = field(default=None, metadata=config(exclude=lambda x: x is None))
    optional_attribute_2: Optional[str] = field(default=None, metadata=config(exclude=lambda x: x is None))

# i convert my json data to dict before that (i have to)
new_image = {"pk": "1", "sk": "1", "created_by": "blah", "created_date_time": "today", "optional_attribute_1": "blah"}

print(NewImage.from_dict(new_image))  # this will not display optional_attribute_1

Expected

Expecting the deserialized object to have the optional attributes when they are present in serialized form.

NewImage(pk='1', sk='1', created_by='blah', created_date_time='today', optional_attribute_1='blah')

Actual

The optional_attribute_2=None is present.

NewImage(pk='1', sk='1', created_by='blah', created_date_time='today', optional_attribute_1='blah', optional_attribute_2=None)

Environment description

Python version: 3.11

Click to see packages ``` boto3==1.34.0 botocore==1.34.0 certifi==2023.11.17 charset-normalizer==3.3.2 dataclasses==0.6 dataclasses-json==0.6.1 dotenv==0.0.5 dynamodb-json==1.3 idna==3.6 jmespath==1.0.1 marshmallow==3.20.1 mypy-extensions==1.0.0 numpy==1.26.2 packaging==23.2 pandas==2.1.4 python-dateutil==2.8.2 python-dotenv==1.0.0 pytz==2023.3.post1 requests==2.31.0 s3transfer==0.9.0 simplejson==3.19.2 six==1.16.0 types-requests==2.31.0.10 typing-inspect==0.9.0 typing_extensions==4.8.0 tzdata==2023.3 urllib3==2.0.7 ```
USSX-Hares commented 6 months ago

Updated description: added expected/actual, code highlight, added imports, moved environment details under <details> tag.

USSX-Hares commented 6 months ago

TL;DR

You are confusing the dataclasses and dataclasses_json functionality.

Long Read

@yakovsushenok even though I agree with your suggestion (that's the feature I also want to exist), you are misguided. The method you are calling is __repr__ from dataclasses package itself, not the one from dataclasses_json. The last controls only (de)serialization, and the dataclasses handle the rest. The extra parameter exclude controls only if the field should be present in the serialized data, and __repr__ will always print all fields with repr=True (enabled by default). You can actually override the __repr__ from the dataclasses package in your class if you want.

Example

Here, take a look:

from dataclasses import dataclass, field
from typing import Optional
from dataclasses_json import dataclass_json, LetterCase, config

@dataclass_json(letter_case=LetterCase.CAMEL)
@dataclass
class ReprTest:
    optional_exclude: Optional[str] = field(default=None, metadata=config(exclude=lambda x: x is None))
    optional_no_repr: Optional[str] = field(default=None, repr=False)

r1 = ReprTest()
r2 = ReprTest(optional_exclude='one', optional_no_repr='two')

print("FIRST:")
print(r1)
print(r1.to_json())
print()

print("SECOND:")
print(r2)
print(r2.to_json())

Output

FIRST:
ReprTest(optional_exclude=None)
{"optionalNoRepr": null}

SECOND:
ReprTest(optional_exclude='one')
{"optionalExclude": "one", "optionalNoRepr": "two"}

As you can see, __repr__ in both scenarios behaves the same way: always prints optional_exclude and does not do that for optional_no_repr. However, in .to_json() optionalExclude is present on the once case and does not for the other while optionalNoRepr is always present.

USSX-Hares commented 6 months ago

If you still want this behaviour, you can use this as a reference:

from abc import ABC
from dataclasses import dataclass, fields, field
from typing import *

@dataclass
class DataclassSmartRepr(ABC):
    def __repr__(self):
        tokens: List[str] = list()

        for f in fields(self):
            if (f.repr and (v := getattr(self, f.name, None)) is not None):
                tokens.append(f'{f.name}={v!r}')

        return f"{type(self).__name__}({', '.join(tokens)})"

@dataclass
class ReprTest:
    optional_with_repr_one: Optional[str] = field(default=None, repr=True)
    optional_with_repr_two: Optional[str] = field(default=None, repr=True)
    optional_no_repr: Optional[str] = field(default=None, repr=False)

    __repr__ = DataclassSmartRepr.__repr__

DataclassSmartRepr.register(ReprTest)

r1 = ReprTest()
r2 = ReprTest(optional_with_repr_one='one', optional_with_repr_two='two', optional_no_repr='three')

print("FIRST:")
print(r1)
print()

print("SECOND:")
print(r2)

Output:

FIRST:
ReprTest()

SECOND:
ReprTest(optional_with_repr_one='one', optional_with_repr_two='two')
yakovsushenok commented 6 months ago

Thanks @USSX-Hares , I understand now.