jazzband / django-taggit

Simple tagging for django
https://django-taggit.readthedocs.io
BSD 3-Clause "New" or "Revised" License
3.33k stars 622 forks source link

Multiple queries when adding tags #880

Open mister-rao opened 11 months ago

mister-rao commented 11 months ago

Inspecting with django-toolbar I found that taggit makes n*2 queries for adding n number of tags.

Is there any way to optimize this?

rtpg commented 10 months ago

Could you post what kind of tags are shown to be made? It might be a bit endemic to how Django models work by default

mister-rao commented 10 months ago

This is my request body:

{
  "name": "New Collection",
  "type": "movie",
  "description": "This is a new movie collection",
  "tags": [
    "movie", "new", "awesome", "binge", "adventure", "thriller", "epic", "fantasy", "dark"
  ]
}

this is how I add tags:

collection.tags.add([ "movie", "new", "awesome", "binge", "adventure", "thriller", "epic", "fantasy", "dark"])

These are the queries: image

image

rtpg commented 10 months ago

OK so if you know what you are doing and don't use save hooks or the like and really know what you are doing , you can use bulk_create and bulk_update (Django ORM methods) to do tag creation manually and create 10 tags in a single query (or at least less queries).

The problem with django-taggit doing that is that would mean that tags would not be able to have save methods. This is basically the performance/usability tradeoff present in all Django systems.

django Taggit's "create N tags" is really just a shortcut for "create 1 tag, but do it for n objects". We're doing lookups to find existing tags, and then calling .save on each tag. Kinda hard to reduce that. And if we call bulk_create/bulk_update, we'll break code that relies on .save being called, or post/pre-save signals on tags. That's why it's more or less up to users to do this themselves.

Having said that, I think this could be a good FAQ entry. I've opted for bulk_X as a solution to this problem as well, so I understand the need in the abstract.

(And as usual: this is all based on the assumption that this is a performance issue on your project to begin with. Measuring the impact on production is so important)