tfoxy / graphene-django-optimizer

Optimize database access inside graphene queries
MIT License
428 stars 84 forks source link

prefetch relay children - duplicate issue of #1, #4, etc. #17

Open gotexis opened 5 years ago

gotexis commented 5 years ago

Hey, I guess something is wrong with my setup, perhaps not even related to graphene-django-optimizer.

I have exactly the same issue with #1

I have used all solutions I can see from Graphene and here, plus that I am even manually calling prefetch_related('children_set'), queries will still be N+1.

Here's my model.

class Province(Model):
    name                        = CharField(max_length=10)

class City(Model):
    province                    = ForeignKey(Province, on_delete=CASCADE)
    name                        = CharField(max_length=15)

Overridding DjangoConnectionField to not use merge_querysets, like suggested: #429

class PrefetchingConnectionField(DjangoFilterConnectionField): 
# or just subclass DjangoConnectionField, doesn't make a difference for N+1

    @classmethod
    def merge_querysets(cls, default_queryset, queryset):
        return queryset

    @classmethod
    def resolve_connection(cls, connection, default_manager, args, iterable):
        if iterable is None:
            iterable = default_manager

        if isinstance(iterable, QuerySet):
            _len = iterable.count()
        else:
            _len = len(iterable)

        connection = connection_from_list_slice(
            iterable,
            args,
            slice_start=0,
            list_length=_len,
            list_slice_length=_len,
            connection_type=connection,
            edge_type=connection.Edge,
            pageinfo_type=PageInfo,
        )
        connection.iterable = iterable
        connection.length = _len
        return connection

Schema

class Province(DjObj):
    class Meta(NodeI):
        model = models.Province
        filter_fields = {   # for DjangoFilterConnectionField
            'id': ['exact'],
        }

class City(DjObj):
    class Meta(NodeI):
        model = models.City
        filter_fields = {
            'id': ['exact'],
        }

class GeoQueryCodex:
    provinces                      = PrefetchingConnectionField(Province)
    province                       = Node.Field(Province)

    def resolve_provinces(self, info, *args, **kw):
        # use the plugin
        qs = gql_optimizer.query(models.Province.objects.all(), info)
        # or just manually prefetch
        qs = models.Province.objects.all().prefetch_related('city_set')
        return qs

    cities                         = PrefetchingConnectionField(City)
    city                           = Node.Field(City)

And the query

query Provinces {
  provinces(first:10) {
    edges {
      node {
        id
        name
        citySet {  # default naming of the children, since a related_name is not supplied
          edges {
            node {
              name
            }
          }
        }
      }
    }
  }
}

just for sanity check, python query does work:


@timeit
def d():
    result = []
    for x in Province.objects.all().prefetch_related('city_set'):
        result.append((x.id, x.name, [area.name for cityin x.city_set.all()]))
    print(result)

# this does not result in N+1, as it would be much faster than without 'prefetch_related'

Now, this result in N+1... Driving me crazy :(

Hope you don't mind me calling for help here @maarcingebala :) Your Saleor repo was the original inspiration that I used the Graphene stack.

please help me fellow python gods

gotexis commented 5 years ago

I think I am 1 step closer to identify the issue.

I put some breakpoints on my code.

Seem like the custom PrefetchingConnectionField.resolve_connection was applied to the Province, not the City, although I specified PrefetchingConnectionField for both...

Is this normal? We know the culprit may be below

            if iterable is not default_manager:
                default_queryset = maybe_queryset(default_manager)
                iterable = cls.merge_querysets(default_queryset, iterable)

Is this supposed to be never run, or this will break the prefetch cache?

Not sure why this is the case though, need some more investigation...

gotexis commented 5 years ago

Problem solved?

Subclassing was indeed too much trouble, I just had to directly modify the below code of Graphene, and prefetch did work as expected...

But in the long run, still I would prefer to do this through subclassing.

class DjangoConnectionField(ConnectionField):
    @classmethod
    def merge_querysets(cls, default_queryset, queryset):
        return queryset
tfoxy commented 5 years ago

Can you tell me which version of python and django are you using?

If you have the time, can you do a PR with a test that fails in this case?

svengt commented 5 years ago

We are also experiencing the same issues. It appears that edges are not auto prefetched. The following query for example results in an extra query per To-Do. Expected result would be to prefetch all items.

{
  allTodos {
    edges {
      node {
        id
        name
        items {
          edges {
            node {
              id
              todo
            }
          }
        }
      }
    }
  }
}

Tested in Python 3.7, Django 2.1 and 2.2

gotexis commented 5 years ago

@tfoxy @svengt's issue pretty much describes what I am experiencing.

I know an instance where gql_optimizer has been successfully deployed, which is Saleor: According to their PR, as they integrated with gql_optimizer, they had to disable django-filter https://github.com/mirumee/saleor/pull/2983#issue-218668958.

So I am guessing there must be some compatibility problem with DjangoFilterConnectionField.


Also there is this new https://docs.graphene-python.org/en/latest/execution/dataloader/

I wonder if anyone has successfully deployed this new approach.


Also meanwhile the more I browse for django/graphql the less I see Django as future-proof. Although I love python and hate js I have no choice but to leave for Prisma/Apollo-server for now. Very sad

svengt commented 5 years ago

@gotexis We are using regular DjangoConnectionField and experience the same problems. My guess it is related to the relay interface.

svengt commented 4 years ago

@gotexis I believe this is fixed in the latest release.

omegion commented 4 years ago

I am having a similar problem, nested relations with filters produce more queries.