nshafer / django-hashid-field

Django Model Field that uses Hashids to obscure the value
MIT License
370 stars 40 forks source link

Automatic Per-Model/Field Salt #31

Closed rcoup closed 3 years ago

rcoup commented 5 years ago

While you can specify a per-field salt, it'd be slick to include the model & field information into a default salt. Currently:

class Model1(models.Model):
    id = HashIdField()

class Model2(models.Model)
    id =  HashIdField()

m1 = Model1.objects.create(id=7)
m2 = Model2.objects.create(id=7)

m1.id => "hjGh7"
m2.id => "hjGh7"

Which reveals unnecessarily that m2.id == m1.id as integers.

My idea is:

class Model1(models.Model):
    id = HashIdField(salt=HashIdField.AUTO)

class Model2(models.Model)
    id = HashIdField(salt=HashIdField.AUTO)

m1 = Model1.objects.create(id=7)
m2 = Model2.objects.create(id=7)

m1.id => "hj9dGh7"
m2.id => "a7rr2eY"

An implementation could look a bit like:

import hashlib

class HashidFieldMixin(object):
    AUTO = object()

    @classmethod
    def build_auto_salt(cls, *args):
        return hashlib.sha1(
            "|".join(args)).encode()
        ).hexdigest()

    def contribute_to_class(self, *args, **kwargs):
        super(HashidFieldMixin, self).contribute_to_class(*args, **kwargs)
        if self.salt is self.AUTO:
            # set the field salt automatically based on the global salt, the model, and the field name
            self.salt = self.build_auto_salt(
                settings.HASHID_FIELD_SALT,
                self.model._meta.label_lower,  # Model label. eg: "myapp.mymodel"
                self.name  # Field name
            )

Important caveat is that if you rename your model or field (or change your global salt), you'd need to set salt explicitly to match the old value. But with the helper, that'd be relatively painless:

class RenamedModel1(models.Model)
    renamed_id = HashIdField(salt=HashIdField.build_auto_salt(settings.HASHID_FIELD_SALT, "myapp.model1", "id"))

Another way to implement this is via #21 by also including prefix into the salt in addition to using it as a prefix.

nshafer commented 5 years ago

Thanks for the suggestion. Where were you a couple years ago when I first made the module, as this would be excellent default behavior. But obviously I couldn't change it now without causing massive issues with existing fields. Perhaps in the next major release when I break compatibility for some other reason as well.

So for this functionality, I see 3 possible options:

  1. A special setting like you outline for adopting this behavior only when asked for: id = HashIdField(salt=HashIdField.AUTO).
  2. A config variable that turns it on by default, but this config variable is false by default for backwards compatibility: HASHID_FIELD_AUTO_SALT = False
  3. Do nothing, and leave it up to the implementer to set themsevles, outside of the library: id = HashIdField(salt=settings.HASHID_FIELD_SALT + "_mymodel_id")

Some thoughts:

I'm going to percolate on this a bit and get back to you.

Nate

rcoup commented 5 years ago

Where were you a couple years ago when I first made the module, as this would be excellent default behavior.

Still using autoincrement integer ids everywhere? 🤭More seriously, I didn't suggest making it the default since breaking everyone's existing IDs is not a good idea (even in a major version bump).

Though, your option (2) solves that I think? Combined with option (1) feels good to me. Document as "for new projects, we recommend setting HASHID_FIELD_AUTO_SALT = True"? And if people are expanding existing projects, suggest to use option (1) for new fields. Regardless people will need to use build_auto_salt() if models or fields are ever renamed.

I guess using a new field type would work too, then eventually deprecate & retire the old one. That way at least everyone would need to take some concrete action to not break their ids.

nshafer commented 5 years ago

Oh for sure, it was my bad idea to make it default. And definitely not even on the table as a consideration. I didn't think you were talking about that at all.

However, you bring up a good point about the salt being tied to the model and field names, and if either change, then the salt will change and IDs will no longer match. Even if they have specifically configured the field as salt=HashIdField.AUTO this could be a very unobvious side-effect that they won't be expecting and be confused why their URLs are now all 404, unless they're very familiar with the docs. For that reason, I'm really favoring option 3, which puts it in their hands entirely, and is not that much extra work for the user. I could add a note to the docs for DHF about the default behavior would result in duplicate IDs, and to use method 3 to make them hash differently. But then if they rename the model or field, it won't change the salt.

rcoup commented 5 years ago

I understand where you're coming from, but renaming your models + ID fields doesn't happen that often (and you'd already need to go and alter APIs/views/urls/etc). How about a compromise idea?

  1. Allow salt to be a callable.
  2. If it is a callable, call it during contribute_to_class() with the field instance to resolve it to a concrete value
  3. Supply some helpers.
  4. Document

Usage:

from hashid_field import HashidField, auto_salt, merge_salt
class Model1(models.Model):
    id = HashIdField(salt=auto_salt)

class RenamedModel2(models.Model):
    renamed_id = HashidField(salt=merge_salt(settings.HASHID_FIELD_SALT, 'myapp.model2', 'id'))

Implementation:

import hashlib

def merge_salt(field, *args):
    """ Merge a set of values to a single salt using a sha1 hash """
    return hashlib.sha1(
        "|".join([str(v) for v in args]).encode()
    ).hexdigest()

def auto_salt(field):
    """
    Helper to build a per-field salt automatically based on:
    1. the global HASHID_FIELD_SALT
    2. the model label (eg. "myapp.mymodel")
    3. the field name (eg. "id")
    Caveat: renaming your model or field will cause the salt to change. Swap this
    for `merge_salt()` with the old values.
    """
    return merge_salt(field,
        settings.HASHID_FIELD_SALT,
        field.model._meta.label_lower,
        field.name
    )

class HashidFieldMixin(object):
    def contribute_to_class(self, *args, **kwargs):
        super(HashidFieldMixin, self).contribute_to_class(*args, **kwargs)
        if callable(self.salt):
            # resolve the field salt
            self.salt = self.salt(self)
rvdrijst commented 4 years ago

I needed this solution (same pk for different model should have different hash). I think something in the internals of django-hashid-field changed because the above solution did not work for me.

For anyone looking for something like this (or even better, if something like this could be merged into the package), here is my solution. It could benefit from some changes in the package to allow it to more easily hook into the flow.

import hashlib
from django.db import models
from hashid_field.descriptor import HashidDescriptor
from hashid_field.field import HashidFieldMixin
from hashids import Hashids

class AutoSaltHashidFieldMixin(HashidFieldMixin):

    @classmethod
    def build_auto_salt(cls, *args):
        return hashlib.sha1("|".join([str(v) for v in args]).encode("utf-8")).hexdigest()

    def contribute_to_class(self, cls, name, **kwargs):
        super().contribute_to_class(cls, name, **kwargs)
        # set the field salt automatically based on the global salt, the model, and the field name
        self.salt = self.build_auto_salt(
            settings.HASHID_FIELD_SALT,
            self.model._meta.label_lower,  # Model label. eg: "myapp.mymodel"
            self.name  # Field name
        )
        self._hashids = Hashids(self.salt, min_length=self.min_length, alphabet=self.alphabet)
        # Set the description to new HashidDescriptor with new self._hashids
        # feels a bit redundant since already done in superclass but needed because
        # only patching _salt is not enough since alphabet is shuffled with old salt
        # The django rest framework fields get their info from this field so needs to be up-to-date
        setattr(cls, self.attname, HashidDescriptor(self.attname, hashids=self._hashids))

# use these fields in your models
class HashidAutoField(AutoSaltHashidFieldMixin, models.AutoField):
    description = "A Hashids obscured AutoField"

class HashidBigAutoField(AutoSaltHashidFieldMixin, models.BigAutoField):
    description = "A Hashids obscured BigAutoField"

I tried to simply patch self._hashids._salt but apparently the Django Rest Framework fields get their alphabet and salt from the field and build a new hashids (not sure why the one on the field is not reused). That didn't work because it turned out the alphabet on the field's _hashids was shuffled on creation with the global salt, so simply patching the _salt would not reshuffle the alphabet, while the new hashids used by the DRF field was shuffled with the new salt (leading to different ids).

Anyway, this led to a cleaner solution because the _hashids on the field is reconstructible with the same alphabet and salt.