jazzband / django-taggit

Simple tagging for django
https://django-taggit.readthedocs.io
BSD 3-Clause "New" or "Revised" License
3.32k stars 624 forks source link

get() returned more than one Tag -- it returned 2! #826

Open naskio opened 1 year ago

naskio commented 1 year ago

Hello Sometimes when adding tags to an existing object, I am getting the following error.

get() returned more than one Tag -- it returned 2!

Here is the code I am using

...
publication.tags.set(all_tags, clear=True)
publication.save()
# I also tried
publication.tags.set(all_tags)
publication.save()
# and
publication.tags.add(*all_tags)
publication.save()
# but still getting the same issue
...

Notes:

rtpg commented 1 year ago

@naskio do you have a longer stack trace? please post more details.

naskio commented 1 year ago
2022-11-16 01:06:22.022 | ERROR    | apps.data_collection.management.commands.tagging:handle:69 - get() returned more than one Tag -- it returned 2!
Traceback (most recent call last):

> File "/app/apps/data_collection/management/commands/tagging.py", line 63, in handle
    publication.tags.set(all_tags, clear=True)
    │           │        └ ['libération', 'Direction', 'Algérie Presse Service', 'guerre de libération', 'EN', 'Communauté nationale', 'Vidéo', 'Abdelma...
    │           └ <taggit.managers.TaggableManager: tags>
    └ <Publication: La communauté algérienne en Libye commémore le 68e anniversaire du déclenchement de la Glorieuse guerre de libé...

  File "/usr/local/lib/python3.9/site-packages/taggit/utils.py", line 124, in inner
    return func(self, *args, **kwargs)
           │    │      │       └ {'clear': True}
           │    │      └ (['libération', 'Direction', 'Algérie Presse Service', 'guerre de libération', 'EN', 'Communauté nationale', 'Vidéo', 'Abdelm...
           │    └ <taggit.managers._TaggableManager object at 0x7ff903015be0>
           └ <function _TaggableManager.set at 0x7ff90e52dee0>
  File "/usr/local/lib/python3.9/site-packages/taggit/managers.py", line 273, in set
    self.add(*tags, **kwargs)
    │    │    │       └ {}
    │    │    └ ['libération', 'Direction', 'Algérie Presse Service', 'guerre de libération', 'EN', 'Communauté nationale', 'Vidéo', 'Abdelma...
    │    └ <function _TaggableManager.add at 0x7ff90e52db80>
    └ <taggit.managers._TaggableManager object at 0x7ff903015be0>
  File "/usr/local/lib/python3.9/site-packages/taggit/utils.py", line 124, in inner
    return func(self, *args, **kwargs)
           │    │      │       └ {}
           │    │      └ ('libération', 'Direction', 'Algérie Presse Service', 'guerre de libération', 'EN', 'Communauté nationale', 'Vidéo', 'Abdelma...
           │    └ <taggit.managers._TaggableManager object at 0x7ff903015be0>
           └ <function _TaggableManager.add at 0x7ff90e52daf0>
  File "/usr/local/lib/python3.9/site-packages/taggit/managers.py", line 152, in add
    tag_objs = self._to_tag_model_instances(tags, tag_kwargs)
               │    │                       │     └ {}
               │    │                       └ ('libération', 'Direction', 'Algérie Presse Service', 'guerre de libération', 'EN', 'Communauté nationale', 'Vidéo', 'Abdelma...
               │    └ <function _TaggableManager._to_tag_model_instances at 0x7ff90e52dc10>
               └ <taggit.managers._TaggableManager object at 0x7ff903015be0>
  File "/usr/local/lib/python3.9/site-packages/taggit/managers.py", line 223, in _to_tag_model_instances
    tag = manager.get(name__iexact=name)
          │       │                └ 'UN'
          │       └ <function QuerySet.get at 0x7ff910d86f70>
          └ <QuerySet [<Tag: South Korea>, <Tag: India>, <Tag: Canadian>, <Tag: Australia>, <Tag: Barça>, <Tag: Xavi>, <Tag: Ukraine>, <T...
  File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 499, in get
    raise self.model.MultipleObjectsReturned(
          │    │     └ <class 'taggit.models.Tag.MultipleObjectsReturned'>
          │    └ <class 'taggit.models.Tag'>
          └ <QuerySet [<Tag: South Korea>, <Tag: India>, <Tag: Canadian>, <Tag: Australia>, <Tag: Barça>, <Tag: Xavi>, <Tag: Ukraine>, <T...

taggit.models.Tag.MultipleObjectsReturned: get() returned more than one Tag -- it returned 2!

NOTE: At the begging I inserted many tags without having TAGGIT_CASE_INSENSITIVE = True but I added it later. so I guess that I may have inserted some tags (for example UKRAINE and Ukraine) which were not the same but became equal after enabling case insensitivity (having duplicates in the DB). @rtpg

rtpg commented 1 year ago

@naskio thank you very much for that detail! It sounds very important.

If in your system you have that problem, the easiest thing to do would be to normalize manually in a database migration. So for example you find all of the TaggedItems that are pointing to UKRAINE and then point them to Ukraine. From there, you should be able to delete UKRAINE.

You'll need to do some database cleanup. I think though that this error should have its own handling, though. And it might even make sense for us to write a helper management command... but at the very least we should document what to do.

rtpg commented 1 year ago

Good first issue: catching MultipleObjectsReturned and rethrowing it with a better error message to handle this.

naskio commented 1 year ago

@rtpg For me the ideal solution would be to have both options: 1- Handling this case by taking only the first tag when get() returns more than one, This would be more flexible enabling us to disable TAGGIT_CASE_INSENSITIVE again without impacting the data. 2- A management command if someone decides to enable TAGGIT_CASE_INSENSITIVE and keep only one instance of similar tags.

ghost commented 1 year ago

Hi @naskio, can I give a shot at solving this issue?

naskio commented 1 year ago

Hi @naskio, can I give a shot at solving this issue?

Yes sure Good luck 😀

AmaMidzu commented 3 months ago

What's the status on this? Up for grabs?

rtpg commented 2 months ago

@AmaMidzu it's yours for the taking! Going to assign you for now just so we can keep track

fazeelghafoor commented 1 month ago

@rtpg is this issue still available for improvement? I'd like to work on it

fazeelghafoor commented 1 month ago

@naskio I've figured out the first part of your recommendation. For the management command to remove duplicate tags, what do you think should be the preference?

  1. Remove all duplicates except the first one.
  2. Keep only the uppercase version, like "UKRAINE" in the example.
  3. If there is no uppercase version, on what basis should we remove the other tags and keep a single tag?

also what would be a suitable name for the management command? "remove_duplicate_tags" or "deduplicate_tags" or something else?

naskio commented 1 month ago

Hi @fazeelghafoor