strawberry-graphql / strawberry-django

Strawberry GraphQL Django extension
https://strawberry.rocks/docs/django
MIT License
415 stars 121 forks source link

Related types yield duplicate nodes and distinct filter fails #651

Open Eraldo opened 2 weeks ago

Eraldo commented 2 weeks ago

Bug Description

Observation in my app: I have a contacts page where contacts can have tags and the same tag is shown shown multiple times.

I checked the dev api and saw that it only happens when using the "related type" (TagListConnection via contact tags field.)

Using a tags query directly works. (not via related type) Using no filter yields duplicate nodes. Using the related one while a filter is active does work. Using the DISTINCT filter does not work. (also shows the duplicates)

Example:

This is the SQL for the following query:

query ContactsTags {
  contacts(last: 1) {
    totalCount
    edges {
      node {
        id
        name
        tags(filters: {DISTINCT: true}) {
          totalCount
          edges {
            node {
              id
              name
            }
          }
        }
      }
    }
  }
}

The result shoes 8 duplicate tags.

{
  "data": {
    "contacts": {
      "totalCount": 63,
      "edges": [
        {
          "node": {
            "id": "SOMEID==",
            "name": "Carmen",
            "tags": {
              "totalCount": 8,
              "edges": [
                {
                  "node": {
                    "id": "VGFnOjYy",
                    "name": "target"
                  }
                },
                {
                  "node": {
                    "id": "VGFnOjYy",
                    "name": "target"
                  }
                },
                ...  // 5 more here
                {
                  "node": {
                    "id": "VGFnOjYy",
                    "name": "target"
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Using the django debug toolbar while running the query shows the following SQL information: Image

System Information

Additional Context

According to @bellini666 the general underlying issue of duplicates due to SQL is known.

Conversation snipet for context:

Basically, if you try to filter the relation through the original model, the join will generate spurious tuples. What you need to do in this case is to filter through a subquery with exists
BUT, having said that, I see that you used DISTINCT, which should also work
Can you check in the generated SQL if the distinct was not applied there? If not, then you just found a bug šŸ˜›

Upvote & Fund

Fund with Polar

SupImDos commented 2 weeks ago

Hi @Eraldo

I think coincidentally we also ran into the same issue a few days ago in #650. Our understanding is that the duplicate results are caused by the extra LEFT OUTER JOIN added by Strawberry Django's window pagination approach when prefetching related m2m types.

For reference, we don't think that using DISTINCT will be enough to solve the issue, as it will only remove the duplicates from the final result set, but the calculated total_count will still be incorrect.

At the moment, we think the two options are to either:

  1. Use Django's inbuilt prefetch slicing to do nested pagination
  2. Prefetch the through model and then prefetch the other side of the m2m as part of handling m2m relations.

See #650 for a more in depth explanation.

Eraldo commented 1 week ago

Thank you @SupImDos for making me aware of the related issue detailing the challenge and possible solution paths. šŸ™ I will keep an eye on it.