kitware-resonant / django-resonant-utils

Django utilities for data management applications.
Apache License 2.0
2 stars 1 forks source link

Optimize custom pagination count query #31

Closed danlamanna closed 3 years ago

danlamanna commented 3 years ago

It should be SELECT COUNT(pk).. with this change.

brianhelba commented 3 years ago

For me, Django always generates SELECT COUNT(*). Note, I'm not directly testing this branch, I'm just running queries directly in the shell for Model.objects.values('pk').count() and Model.objects.count().

Regardless, is COUNT(pk) any better?

For me, an EXPLAIN on a small table uses a "Seq Scan" with an identical cost in both cases. On a larger table, it uses an "Index Only Scan" in both cases (interestingly, the COUNT(*) case uses the index of a different column, resulting in a lower cost :shrug:).

Online sources don't seem to indicate that there's a significant difference between the two:

Finally, I'm vaguely worried about the correctness in cases where the queryset is the result of heavy manipulation, aggregation, or joins. I haven't tried to construct a pathological example, but I'm not sure if it's safe to expect that any queryset will have a pk.

danlamanna commented 3 years ago

Fair enough.