justinmayer / django-autoslug

AutoSlugField for Django. Supports (but not does not require) unidecode/pytils for transliteration. Old issue tracker is at Bitbucket: https://bitbucket.org/neithere/django-autoslug/issues
https://readthedocs.org/projects/django-autoslug/
GNU Lesser General Public License v3.0
295 stars 105 forks source link

Slug is always regenerated #62

Open igo opened 4 years ago

igo commented 4 years ago

Slug is always regenerated when model is saved (ie on update in admin). Even when always_update=False. https://github.com/justinmayer/django-autoslug/blob/3a87391ae7a5141d06396b93ee5b4030d9790a14/autoslug/fields.py#L275

tanwirahmad commented 4 years ago

I have solved this issue by updating the pre_save method like this:

class ExtendedAutoSlugField(AutoSlugField):

    def pre_save(self, instance, add):

        # get currently entered slug
        value = self.value_from_object(instance)

        if self.always_update or not value:
            value = super(ExtendedAutoSlugField, self).pre_save(instance, add)

        return value

If interested, I can make a merge request.

mathemaat commented 4 years ago

I think the problem here is that the name always_update is poorly chosen. Instead, it should be called something like always_repopulate or autorefresh.

Setting always_update to True will apply the slugify function to the populate_from field on every save. However, and this is very tricky (!), if it's set to False instead, it will apply the slugify function to the current value of the AutoSlug field.

The current assumption in the code seems to be that slugify(value) == slugify(slugify(value)), which is correct for slugs. However, the premise of this library is that it can be used for all sorts of fields whose values are computed from values in other fields.

Example:

In models.py:

from .fields import LongURLField
from .utils import md5_url

class Source(models.Model):
    url = LongURLField(null=True, unique=True) # custom url field with max_len > 255
    url_md5 = AutoSlugField(populate_from='url', always_update=False, slugify=md5_url, null=True, unique=True)
    # more fields here

In utils.py:

from hashlib import md5

def md5_url(url):
    return md5(url.encode("utf-8")).hexdigest()

Application code *:

source, _ = Source.objects.get_or_create(
    url_md5=md5_url(url),
    defaults={
        'url': url,
        # more fields here
    }
)

What should happen is that the existing source is taken from the database if the url already exists, and that a new source is created if the url is new.

This didn't work as expected. What happens in the background is this:

source.url_md5 = md5('the-website.com')
source.url = 'the-website.com'
source.save() # wrong md5 value (md5 has now been applied twice)

For a moment, I was considering a silly workaround:

source.url_md5 = md5('the-website.com')
source.url = 'the-website.com'
# (equivalent of adding `'md5_url' : None` to `defaults` in `get_or_create()`)
source.url_md5 = None:
source.save() # correct md5 value
# but
source.save() # wrong md5 value (md5 has now been applied twice)
source.save() # wrong md5 value (md5 has now been applied thrice)

So, although I don't want to update the md5 field once it's set, I do have to set always_update to True to fix this issue, which is very counterintuitive.

* background information: the reason I'm using the md5 for duplication checks and not the url field itself is that longtext fields don't make good index fields (i.e. lookups will not be as fast)

mathemaat commented 4 years ago

Summarizing my previous post: there are actually three scenarios, where always_update as a boolean only gives two:

  1. Calculate the slugify value only once and lock that value (except for manual changes perhaps)
  2. Repopulate the slugify value on every save (using the populate_from field)
  3. Recalculate the slugify value on every save (reusing the value in the AutoSlug field)

Setting always_update to True refers to option 2, whereas setting always_update to False refers to option 3. Options 1 and 3 can be considered equivalent if and only if slugify(slugify(value)) = slugify(value).

When I started using this library, I expected the choice to be between cases 1 and 2. There may be use cases where option 3 makes sense, e.g. if you want to keep track of the number of saves and the slugify function is a function that adds one to the value.