oxan / djangorestframework-dataclasses

Dataclasses serializer for Django REST framework
BSD 3-Clause "New" or "Revised" License
431 stars 28 forks source link

Implement strict de-serialization option #81

Closed johnthagen closed 1 year ago

johnthagen commented 1 year ago

Currently, if extra data fields are passed into a DataclassSerializer instance that are not modeled by the underlying @dataclass, they are silently ignored. This could hide subtle bugs in clients that think they are sending a particular field, but in fact the field is being ignored and they are not made aware of it.

For example:

from dataclasses import dataclass

from rest_framework_dataclasses.serializers import DataclassSerializer

@dataclass
class Person:
    name: str
    height: float

class PersonSerializer(DataclassSerializer[Person]):
    class Meta:
        dataclass = Person

p = PersonSerializer(
    data={
        "name": "Bruce",
        "height": 6.2,
        "im_batman": True,  # Extra, invalid field
    }
)
p.is_valid(raise_exception=True)  # Be able to configure the DataclassSerializer to fail validation here

A partial implementation (does not work with nested DataclassSerializers):

class StrictDataclassSerializer(DataclassSerializer[T], Generic[T]):
    def validate(self, attrs: dict[str, Any]) -> dict[str, Any]:
        # Need to check for initial_data as nested serializers will not have this field.
        if hasattr(self, "initial_data"):
            unknown_keys = set(self.initial_data.keys()) - set(fields.keys())
            if unknown_keys:
                raise ValidationError(f"Unknown keys found: {unknown_keys}")
        return attrs

A feature like this could be enabled in the Meta:

class PersonSerializer(DataclassSerializer[Person]):
    class Meta:
        dataclass = Person
        strict = True
oxan commented 1 year ago

This behaviour isn't caused by nor specific to the DataclassSerializer, it's consistent with what a stock serializer does:

class TestSerializer(serializers.Serializer):
    test = serializers.CharField()

ser = TestSerializer(data={'test': 'hello', 'additional': 'extra'})
ser.is_valid()  # True
ser.validated_data  # OrderedDict([('test', 'hello')])

I'm hesitant to implement such a strict mode, since it's a layering violation as it changes the behaviour of DRF's default serializers. The goal of DataclassSerializer is to autogenerate serializer fields based on a dataclass definition (and turn the returned value into a dataclass), everything that's not directly related to that should be implemented elsewhere.

If you want this behaviour, it should also be fairly straightforward to implement in your own project: use a subclass of DataclassSerializer with the added validate() method. By setting or defining the serializer_dataclass_field field on that subclass, it'll also work for nested dataclasses (see e.g. HyperlinkedDataclassSerializer at the bottom of serializers.py for an example).

johnthagen commented 1 year ago

@oxan I'm happy to call this out of scope for djangorestframework-dataclasses. As you mention, this is how the default DRF works (I may disagree with DRF's decision, but it is how it works).

We use Pydantic in other parts of our project and they have a setting for this (this is the Pydantic v1 version):

class StrictBaseModel(BaseModel):
    class Config:
        # Don't allow extra parameters
        extra = Extra.forbid

This has caught subtle bugs in the past.

But in the DRF world, this is working as expected. Please feel free to close this issue and thank you for the ideas on how to implement this in user-land!

oxan commented 1 year ago

Yeah, I can see why rejecting extra data might be useful (though I could also make an argument for the opposite), but I indeed lean towards calling it out of scope for this project.

While writing my previous comment I realized that the documentation doesn't have an explicit example on how to subclass DataclassSerializer, which might be useful. I'll add that.