philipn / django-rest-framework-filters

Better filtering for Django REST Framework
Other
848 stars 131 forks source link

Document how to filter prefetch_related w/ related query #286

Open murdav opened 5 years ago

murdav commented 5 years ago

First of all thanks for this package, it's very useful! My models are structured like these:

class Info(TimeStampedModel):
    identifier = models.UUIDField(_('info'), default=uuid.uuid4, primary_key=True, editable=False)
    ***
class Alert(TimeStampedModel):
    identifier = models.UUIDField(_('alert'), default=uuid.uuid4, primary_key=True, editable=False)
    infos = models.ManyToManyField(Info)
    ***
class AlertViewSet(ReadOnlyModelViewSet):
    filter_backends = (ComplexFilterBackend, )
    filter_class = AlertFilter

    def get_queryset(self):
        return Alert.objects.select_related('sender', 'status').prefetch_related('infos').distinct()
class InfoViewSet(ModelViewSet):
    filter_backends = (ComplexFilterBackend, )
    filter_class = InfoFilter

    queryset = Info.objects.prefetch_related('categories'').select_related('urgency', 'owner')

Basically an Alertcan have mutiple Infos.

I have these simple filters:

class AlertFilter(filters.FilterSet):
    infos = filters.RelatedFilter(InfoFilter, field_name='infos', queryset=Info.objects.all())

    class Meta:
        model = Alert
        fields = {
            'restriction': ['icontains'],
            'note': ['icontains'],
        }
class InfoFilter(filters.FilterSet):
    # status = filters.ChoiceFilter(choices=INFO_STATUS_CHOICES)

    class Meta:
        model = Info
        fields = {
            # 'status': ['exact']
        }

I tried to use the definition in the Meta or the field declaration in the class but I get the same result.

I would like to do a query like http://localhost:8001/v1/alerts/?format=json&infos__status=AP. But the result contains anyway all the infos (whatevet it's the status) for the alerts that have at least one Info with the status equals to "AP". As result I would expect only the infos for the alerts that have the status equals to AP. Is it correct?

rpkilby commented 5 years ago

Hi @murdav. You can filter the related results using a Prefetch object. However, handling this is kind of an odd case, because serializing the related Info objects is obviously a serializer-level responsibility, but Alert filtering is a view-level responsibility. So, where to filter the related Info objects is a little unclear, although I'd opt to do it in the filterset, then document the source of the filtered objects in the serializer. e.g., something like this.

from django.db import models
from django.db.models import Prefetch
from django.db.models.fields.related import ForeignObjectRel
from django_filters.utils import get_model_field

class PrefetchRelatedFilterSet(filters.FilterSet):
    """Prefetch to-many relationships with the same query used to filter the main relationship."""

    def check_if_to_many(self, field_name):
        field = get_model_field(self._meta.model, field_name)

        # Check both forwards & reverse to-many relationships
        is_m2m = isinstance(field, models.ManyToManyField)
        is_reverse_many = isinstance(field, ForeignObjectRel) and field.multiple

        return is_m2m or is_reverse_many

    def filter_related_filtersets(self, queryset):
        queryset = super().filter_related_querysets(queryset)

        for related_name, related_filterset in self.related_filtersets.items():
            # Prefetch should only be applied if related filterset has data.
            prefix = '%s%s' % (related(self, related_name), LOOKUP_SEP)
            if not any(value.startswith(prefix) for value in self.data):
                continue

            # Prefetch should only be applied if the relationship is to-many
            field_name = self.filters[related_name].field_name
            if not self.check_if_to_many(field_name):
                continue

            # note: field_name may not be the actual related query name, may need to be fixed
            prefetch = Prefetch(field_name, queryset=related_filterset.qs, to_attr=f'filtered_{field_name}')
            queryset = queryset.prefetch_related(prefetch)

        return queryset

class AlertSerializer(serializers.ModelSerializer):
    # Filtered infos/relationships are provided by AlertFilter
    infos = InfoSerializer(many=True, source='filtered_infos', read_only=True)

class AlertFilter(PrefetchRelatedFilterSet):
    ...

A couple of notes:

rpkilby commented 5 years ago

I'm adding the above as a potential feature request, since it seems useful to automatically prefetch to-many relationships with the same query.

murdav commented 5 years ago

@rpkilby many thanks for replying. I tested the code but I encountered the N+1 queries problem because each info itself has several manytomany and onetomany relations. While calling the http://*/alerts/ URL (without applying filters) the n+1 doesn't happen.

This the AlertViewSet

class AlertViewSet(ReadOnlyModelViewSet):
    # filter_backends = (ComplexFilterBackend, )
    filter_class = AlertFilter
    pagination_class = AlertPagination
    serializer_class = AlertSerializer

    def get_queryset(self):
        return Alert.objects.all().select_related('sender', 'status', 'msg_type', 'source', 'scope', 'region',
                                          'region__country', 'region__country__continent')\
                    .prefetch_related('infos', 'infos__owner', 'infos__severity', 'infos__certainty', 'infos__urgency',
                                      'infos__region__country', 'infos__region__country__continent', 'infos__source',
                                      'infos__areas', 'infos__areas__geocodes',
                                      'infos__categories', 'infos__response_types', 'infos__event_codes',
                                      'infos__resources', 'infos__parameters', 'infos__parameters__system_parameter')
                    #.distinct()

Most probably applying filters like on alertinfosstatus make it to reload the alert__infos instance and there the n+1 issue comes from.

I saw similar cases where if the queryset for the child relationship isn't a .all(), or something else triggers Django to think that the prefetch wouldn't match the new query, it doesn't use the prefetched results, maybe is it the case?

rpkilby commented 5 years ago

The issue is that the filtered prefetch is named filtered_infos, while the other prefetch calls are traversing the the unfiltered infos relationship. The docs show that you can traverse prefetches.

>>> vegetarian_pizzas = Pizza.objects.filter(vegetarian=True)
>>> Restaurant.objects.prefetch_related(
...     Prefetch('pizzas', queryset=vegetarian_pizzas, to_attr='vegetarian_menu'),
...     'vegetarian_menu__toppings')

So, you would need add in the additional prefetch calls against filtered_infos. Alternatively, you could drop the to_name, and have the filtered infos overwrite the default infos relationship. That said, not sure if the below are equivalent:

>>> vegetarian_pizzas = Pizza.objects.filter(vegetarian=True)
>>> Restaurant.objects.prefetch_related(
...     Prefetch('pizzas', queryset=vegetarian_pizzas),
...     'pizzas__toppings')
>>> vegetarian_pizzas = Pizza.objects.filter(vegetarian=True)
>>> Restaurant.objects.prefetch_related(
...     'pizzas__toppings',
...     Prefetch('pizzas', queryset=vegetarian_pizzas))
rpkilby commented 5 years ago

One thing I didn't really consider is nested prefetch calls. It's not clear if nested prefetching works with nested subqueries, or if they all need to be called on the primary queryset.

Greg-Hamel commented 3 years ago

Just implemented a feature using mostly what was written here. Thanks @rpkilby!

To help future people encountering this thread, from what I encountered implementing this, the first line of filter_related_filtersets() should read queryset = super().filter_related_filtersets(queryset)